1
0
Fork 0
mirror of https://github.com/prometheus-operator/prometheus-operator.git synced 2025-04-21 03:38:43 +00:00

Documentation/user-guides: add alerting guide

This commit is contained in:
Frederic Branczyk 2017-03-02 16:19:52 +01:00
parent 1804e7fb42
commit 3f41c50407
No known key found for this signature in database
GPG key ID: CA14788B1E48B256
9 changed files with 200 additions and 24 deletions

View file

@ -0,0 +1,165 @@
# Alerting
This guide assumes you have a basic understanding of the `Prometheus` resource
and have read the [getting started](../getting-started/getting-started.md).
Besides the `Prometheus` and `ServiceMonitor` resource the Prometheus Operator
also introduces the `Alertmanager`. It allows declaratively describing an
Alertmanager cluster. Before diving into deploying an Alertmanager cluster, it
is important to understand the contract between Prometheus and Alertmanager.
The Alertmanager's features include:
* Deduplicating alerts fired by Prometheus
* Silencing alerts
* Route and send grouped notifications via providers (PagerDuty, OpsGenie, ...)
Prometheus' configuration includes so called rule files, which contain the
[alerting rules](https://prometheus.io/docs/alerting/rules/). When an alerting
rule triggers it fires that alert against *all* Alertmanager instances, on
*every* rule evaluation interval. The Alertmanager instances communicate to
each other which notifications have already been sent out. You can read more
about why these systems have been designed this way in the [High Availability
scheme description](../../docs/high-availability.md).
An example Alertmanager cluster could look like this:
[embedmd]:# (examples/alertmanager-example.yaml)
```yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3
```
However, the Alertmanager instances will not be able to start up, unless a
valid configuration is given. This is an example configuration, that does not
actually do anything as it sends notifications against a non existent
`webhook`, but will allow the Alertmanager to start up. Read more about how to
configure the Alertmanager on the [upstream
documentation](https://prometheus.io/docs/alerting/configuration/).
[embedmd]:# (examples/alertmanager-example-config.yaml)
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-example
data:
alertmanager.yaml: |-
global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://alertmanagerwh:30500/'
```
To be able to view the web UI, expose it via a `Service`. A simple way to do
this is to use a `Service` of type `NodePort`.
[embedmd]:# (examples/alertmanager-example-service.yaml)
```yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager-example
spec:
type: NodePort
ports:
- name: web
nodePort: 30903
port: 9093
protocol: TCP
targetPort: web
selector:
alertmanager: example
```
Once created it allows the web UI to be accessible via a node's IP and the port
`30903`.
Now this is a fully functional highly available Alertmanager cluster, but it
does not get any alerts fired against it. Let's setup Prometheus instances that
will actually fire alerts against it.
[embedmd]:# (examples/prometheus-example.yaml)
```yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
name: example
spec:
replicas: 2
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
```
Prometheus rule files are held in a `ConfigMap` called
`prometheus-<prometheus-object-name>-rules`. All top level files that end with
the `.rules` extension will be loaded by a Prometheus instance.
[embedmd]:# (examples/prometheus-example-rules.yaml)
```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus-example-rules
data:
example.rules: |
ALERT ExampleAlert
IF vector(1)
```
> Note the Prometheus Operator will create an empty `ConfigMap` if it does not
> already exist.
That example `ConfigMap` always immediately triggers an alert, which is only
for demonstration purposes. To validate that everything is working properly
have a look at each of the Prometheus web UIs.
To be able to view the web UI without a `Service`, `kubectl`'s proxy
functionality can be used.
Run:
```bash
kubectl proxy --port=8001
```
Then the web UI of each Prometheus instance can be viewed, they both have a
firing alert called `ExampleAlert`, as defined in the loaded alerting rules.
* http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-0:9090/alerts
* http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-1:9090/alerts
Looking at the status page for "Runtime & Build Information" on the Prometheus
web UI shows the discovered and active Alertmanagers that the Prometheus
instance will fire alerts against.
* http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-0:9090/status
* http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-1:9090/status
These show three discovered Alertmanagers.
Heading to the Alertmanager web UI now shows one active alert, although all
Prometheus instances are firing it. [Configuring the
Alertmanager](https://prometheus.io/docs/alerting/configuration/) further
allows custom alert routing, grouping and notification mechanisms.

View file

@ -1,7 +1,7 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-main
name: alertmanager-example
data:
alertmanager.yaml: |-
global:

View file

@ -1,10 +1,7 @@
apiVersion: v1
kind: Service
metadata:
name: alertmanager-main
labels:
app: alertmanager
alertmanager: main
name: alertmanager-example
spec:
type: NodePort
ports:
@ -14,4 +11,4 @@ spec:
protocol: TCP
targetPort: web
selector:
alertmanager: main
alertmanager: example

View file

@ -0,0 +1,6 @@
apiVersion: monitoring.coreos.com/v1alpha1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3

View file

@ -0,0 +1,8 @@
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus-example-rules
data:
example.rules: |
ALERT ExampleAlert
IF vector(1)

View file

@ -0,0 +1,17 @@
apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
name: example
spec:
replicas: 2
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi

View file

@ -54,6 +54,6 @@ embedmd:
go get github.com/campoy/embedmd
docs: embedmd
echo "test"
embedmd -w `find Documentation -name "*.md"`
.PHONY: all build crossbuild test format check-license container e2e-test e2e-status e2e clean-e2e embedmd docs

View file

@ -1,9 +0,0 @@
apiVersion: "monitoring.coreos.com/v1alpha1"
kind: "Alertmanager"
metadata:
name: "main"
labels:
alertmanager: "main"
spec:
replicas: 3
version: v0.5.0

View file

@ -1,8 +0,0 @@
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus-main-rules
data:
test.rules: |
ALERT SomethingIsUp
IF up == 1