1
0
Fork 0
mirror of https://github.com/prometheus-operator/prometheus-operator.git synced 2025-04-21 03:38:43 +00:00

Merge pull request from brancz/exposing-metrics

Documentation: exposing metrics
This commit is contained in:
Frederic Branczyk 2017-01-24 16:28:02 +01:00 committed by GitHub
commit baf9688008
4 changed files with 83 additions and 24 deletions

View file

@ -51,12 +51,12 @@ A healthy node would be one that has joined the existing mesh network and has
been communicated the state that it missed while that particular instance was
down for the upgrade.
Currently there is no way to tell whether an Alertmanger instance is healthy
Currently there is no way to tell whether an Alertmanager instance is healthy
under the above conditions. There are discussions of using vector clocks to
resolve merges in the above mentioned situation, and ensure on a best effort
basis that joining the network was successful.
> Note that single instance Alertmanger setups will therefore not have zero
> Note that single instance Alertmanager setups will therefore not have zero
> downtime on deployments.
The current implementation of rolling deployments simply decides based on the

View file

@ -0,0 +1,48 @@
# Exposing Metrics
There are a number of
[applications](https://prometheus.io/docs/instrumenting/exporters/#directly-instrumented-software)
that are natively instrumented with Prometheus metrics, those applications
simply expose the metrics through an HTTP server.
The Prometheus developers and the community are maintaining [client
libraries](https://prometheus.io/docs/instrumenting/clientlibs/#client-libraries)
for various languages. If you want to monitor your own applications and
instrument them natively, chances are there is already a client library for
your language.
Not all software is natively instrumented with Prometheus metrics, however, do
record metrics in some other form. For these kinds of applications there are so
called
[exporters](https://prometheus.io/docs/instrumenting/exporters/#third-party-exporters).
Exporters can generally be divided into two categories:
* Instance exporters: These expose metrics about a single instance of an
application. For example the HTTP requests that a single HTTP server has
exporters served. These exporters are deployed as a
[side-car](http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html)
container in the same pod as the actual instance of the respective application.
A real life example is the [`dnsmasq` metrics
sidecar](https://github.com/kubernetes/dns/blob/master/docs/sidecar/README.md),
which converts the proprietary metrics format communicated over the DNS
protocol by dnsmasq to the Prometheus exposition format and exposes it on an
HTTP server.
* Cluster-state exporters: These expose metrics about an entire system, they
could be native to the environment the application constructs. For example
these could be the number 3D objects in a game, or metrics about a Kubernetes
deployment. These exporters are typically deployed as a normal Kubernetes
deployment, but can vary depending on the nature of the particular exporter. A
real life example of this is the
[`kube-state-metrics`](https://github.com/kubernetes/kube-state-metrics)
exporter, which exposes metrics about the cluster state of a Kubernetes
cluster.
Lastly in some cases it is not a viable option to expose metrics via an HTTP
server. For example a `CronJob` may only run for a few seconds - not long
enough for Prometheus to be able to scrape the HTTP endpoint. The Pushgateway
was developed to be able to collect metrics in a scenarion like that, however,
it is highly recommended to not use the Pushgateway if possible. Read more
about when to use the Pushgateway and alternative strategies here:
https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway .

View file

@ -111,6 +111,7 @@ it brought up as data sources in potential Grafana deployments.
Prometheus instances are deployed with default values for requested and maximum
resource usage of CPU and memory. This will be made configurable in the `Prometheus`
TPR eventually.
Prometheus comes with a variety of configuration flags for its storage engine that
have to be tuned for better performance in large Prometheus servers. It will be the
operators job to tune those correctly to be aligned with the experiences load

View file

@ -2,18 +2,41 @@
The `ServiceMonitor` third party resource (TPR) allows to declaratively define
how a dynamic set of services should be monitored. Which services are selected
to be monitored with the desired configuration is defined using label selections.
This allows to dynamically express monitoring without having to update additional
configuration for services that follow known monitoring patterns.
to be monitored with the desired configuration is defined using label
selections. This allows an organization to introduce convensions around how
metrics are exposed, and then following these conventions new services are
automatically discovered, without the need to reconfigure the system.
A service may expose one or more service ports, which are backed by a list
of multiple endpoints that point to a pod in the common case.
## Design
In the `endpoints` section of the TPR, we can configure which ports of these
endpoints we want to scrape for metrics and with which paramters. For advanced use
cases one may want to monitor ports of backing pods, which are not directly part
of the service endpoints. This is also made possible by the Prometheus Operator.
For Prometheus to monitor any application within Kubernetes an `Endpoints`
object needs to exist. `Endpoints` objects are essentially lists of IP
addresses. Typically an `Endpoints` object is populated by a `Service` object.
A `Service` object discovers `Pod`s by a label selector and adds those to the
`Endpoints` object.
A `Service` may expose one or more service ports, which are backed by a list of
multiple endpoints that point to a `Pod` in the common case. This is reflected
in the respective `Endpoints` object as well.
The `ServiceMonitor` object introduced by the Prometheus Operator in turn
discovers those `Endpoints` objects and configures Prometheus to monitor those
`Pod`s.
The `endpoints` section of the `ServiceMonitorSpec`, is used to configure which
ports of these `Endpoints` are going to be scraped for metrics, and with which
parameters. For advanced use cases one may want to monitor ports of backing
`Pod`s, which are not directly part of the service endpoints. Therefore when
specifying an endpoint in the `endpoints` section, they are strictly used.
> Note: `endpoints` (lowercase) is the TPR field, while `Endpoints`
> (capitalized) is the Kubernetes object kind.
While `ServiceMonitor`s must live in the same namespace as the `Prometheus`
TPR, discovered targets may come from any namespace. This is important to allow
cross-namespace monitoring use cases, e.g. for meta-monitoring. Using the
`namespaceSelector` of the `ServiceMonitorSpec`, one can restrict the
namespaces the `Endpoints` objects are allowed to be discovered from.
## Specification
@ -61,16 +84,3 @@ of the service endpoints. This is also made possible by the Prometheus Operator.
| any | Match any namespace | false | bool | false |
| matchNames | Explicit list of namespace names to select | false | string array | |
## Current state and roadmap
### Namespaces
While `ServiceMonitor`s must live in the same namespace as the `Prometheus` TPR,
discovered targets may come from any namespace. This is important to allow cross-namespace
monitoring use cases, e.g. for meta-monitoring.
Currently, targets are always discovered from all namespaces. In the future, the
`ServiceMonitor` should allow to restrict this to one or more namespaces.
How such a configuration would look like, i.e. explicit namespaces, selection by labels,
or both, and what the default behavior should be is still up for discussion.