mirror of
https://github.com/prometheus-operator/prometheus-operator.git
synced 2025-04-21 11:48:53 +00:00
Getting-Started Page for Platform guide (#6887)
* Getting-started page for Platform guide * Modified intorduction of Alerting page * Integrating Prometheus section moved to Getting-staerted page * Examples modified * RBAC line modified
This commit is contained in:
parent
55987aa91d
commit
02ffdd7a27
13 changed files with 209 additions and 127 deletions
Documentation
getting-started.mdhigh-availability.mdoperator.mdrbac-crd.mdrbac.mdthanos.mdtroubleshooting.md
user-guides
example/user-guides/getting-started
183
Documentation/getting-started.md
Normal file
183
Documentation/getting-started.md
Normal file
|
@ -0,0 +1,183 @@
|
|||
---
|
||||
weight: 201
|
||||
toc: true
|
||||
title: Getting Started
|
||||
menu:
|
||||
docs:
|
||||
parent: user-guides
|
||||
lead: ""
|
||||
images: []
|
||||
draft: false
|
||||
description: Getting started page for Platform Guide
|
||||
---
|
||||
|
||||
This guide assumes you have a basic understanding of the Prometheus Operator. If you are new to it, please start with the [Introduction](introduction.md) page before proceeding. This guide will walk you through deploying Prometheus and Alertmanager instances.
|
||||
|
||||
## Deploying Prometheus
|
||||
|
||||
To deploy a Prometheus instance, you must create the [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/authorization/) rules for the Prometheus service account.
|
||||
|
||||
First, create a ServiceAccount for Prometheus.
|
||||
|
||||
```yaml mdox-exec="cat example/rbac/prometheus/prometheus-service-account.yaml"
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: prometheus
|
||||
```
|
||||
|
||||
Next, create a ClusterRole that grants Prometheus the necessary permissions to discover and scrape the targets within the cluster.
|
||||
|
||||
```yaml mdox-exec="cat example/rbac/prometheus/prometheus-cluster-role.yaml"
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: prometheus
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources:
|
||||
- nodes
|
||||
- nodes/metrics
|
||||
- services
|
||||
- endpoints
|
||||
- pods
|
||||
verbs: ["get", "list", "watch"]
|
||||
- apiGroups: [""]
|
||||
resources:
|
||||
- configmaps
|
||||
verbs: ["get"]
|
||||
- apiGroups:
|
||||
- discovery.k8s.io
|
||||
resources:
|
||||
- endpointslices
|
||||
verbs: ["get", "list", "watch"]
|
||||
- apiGroups:
|
||||
- networking.k8s.io
|
||||
resources:
|
||||
- ingresses
|
||||
verbs: ["get", "list", "watch"]
|
||||
- nonResourceURLs: ["/metrics"]
|
||||
verbs: ["get"]
|
||||
```
|
||||
|
||||
Now, create a ClusterRoleBinding to bind the ClusterRole to the Prometheus ServiceAccount.
|
||||
|
||||
```yaml mdox-exec="cat example/rbac/prometheus/prometheus-cluster-role-binding.yaml"
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: prometheus
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: prometheus
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: prometheus
|
||||
namespace: default
|
||||
```
|
||||
|
||||
Apply all these manifests to create the necessary RBAC resources. Now you are all set to deploy a Prometheus instance. Here is an example of a basic Prometheus instance manifest.
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/getting-started/prometheus.yaml"
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: prometheus
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
||||
```
|
||||
|
||||
To verify that the instance is up and running, run:
|
||||
|
||||
```bash
|
||||
kubectl get -n default prometheus prometheus -w
|
||||
```
|
||||
|
||||
For more information, see the [Prometheus Operator RBAC guide]({{< ref "rbac" >}}).
|
||||
|
||||
## Deploying Alertmanager
|
||||
|
||||
Let us take a simple example that creates 3 replicas of Alertmanager.
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/alerting/alertmanager-example.yaml"
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Alertmanager
|
||||
metadata:
|
||||
name: example
|
||||
spec:
|
||||
replicas: 3
|
||||
```
|
||||
|
||||
Wait for all Alertmanager pods to be ready:
|
||||
|
||||
```bash
|
||||
kubectl get pods -l alertmanager=example -w
|
||||
```
|
||||
|
||||
However, Alertmanager as it is now is of no use to us. To properly use Alertmanager, it is important to understand the relationship between Prometheus and Alertmanager. Alertmanager is used to:
|
||||
|
||||
* Deduplicate alerts received from Prometheus.
|
||||
* Silence alerts.
|
||||
* Route and send grouped notifications to various integrations (PagerDuty, OpsGenie, mail, chat, …).
|
||||
|
||||
So, to put Alertmanager instances to use, you would need to integrate it with Prometheus.
|
||||
|
||||
## Integrating Alertmanager With Prometheus
|
||||
|
||||
### Exposing the Alertmanager service
|
||||
|
||||
To access the Alertmanager interface, you have to expose the service to the outside. For
|
||||
simplicity, we use a `NodePort` Service.
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/alerting/alertmanager-example-service.yaml"
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: alertmanager-example
|
||||
spec:
|
||||
type: NodePort
|
||||
ports:
|
||||
- name: web
|
||||
nodePort: 30903
|
||||
port: 9093
|
||||
protocol: TCP
|
||||
targetPort: web
|
||||
selector:
|
||||
alertmanager: example
|
||||
```
|
||||
|
||||
Once the Service is created, the Alertmanager web server is available under the
|
||||
node's IP address on port `30903`.
|
||||
|
||||
> Note: Exposing the Alertmanager web server this way may not be an applicable solution. Read more about the possible options in the [Ingress guide](user-guides/exposing-prometheus-and-alertmanager.md).
|
||||
|
||||
### Configuring Alertmanager in Prometheus
|
||||
|
||||
The Alertmanager cluster is now fully functional and highly available, but no
|
||||
alerts are fired against it.
|
||||
|
||||
First, create a Prometheus instance that will send alerts to the Alertmanger cluster:
|
||||
|
||||
```
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: example
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
||||
replicas: 2
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- namespace: default
|
||||
name: alertmanager-example
|
||||
port: web
|
||||
```
|
||||
|
||||
The `Prometheus` resource discovers all of the Alertmanager instances behind
|
||||
the `Service` created before (pay attention to `name`, `namespace` and `port`
|
||||
fields which should match with the definition of the Alertmanager Service).
|
||||
|
||||
Open the Prometheus web interface, go to the "Status > Runtime & Build
|
||||
Information" page and check that the Prometheus has discovered 3 Alertmanager
|
||||
instances.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 206
|
||||
weight: 207
|
||||
toc: true
|
||||
title: High Availability
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 209
|
||||
weight: 210
|
||||
toc: false
|
||||
title: CLI reference
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 205
|
||||
weight: 206
|
||||
toc: true
|
||||
title: RBAC for CRDs
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 204
|
||||
weight: 205
|
||||
toc: true
|
||||
title: RBAC
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 203
|
||||
weight: 204
|
||||
toc: true
|
||||
title: Thanos
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 210
|
||||
weight: 211
|
||||
toc: true
|
||||
title: Troubleshooting
|
||||
menu:
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
weight: 252
|
||||
toc: true
|
||||
title: Alerting
|
||||
title: Alerting Routes
|
||||
menu:
|
||||
docs:
|
||||
parent: user-guides
|
||||
|
@ -11,72 +11,31 @@ draft: false
|
|||
description: Alerting guide
|
||||
---
|
||||
|
||||
This guide assumes that you have a basic understanding of the Prometheus
|
||||
operator, and that you have already followed the [Getting Started]({{< ref
|
||||
"getting-started" >}}) guide.
|
||||
This guide assumes you already have a basic understanding of the Prometheus Operator and have gone through the [Getting Started]({{< ref "getting-started" >}}) guide. We’re also expecting you to know how to run an Alertmanager instance.
|
||||
|
||||
{{< alert icon="👉" text="Prometheus Operator requires use of Kubernetes v1.16.x and up."/>}}
|
||||
|
||||
The Prometheus Operator introduces an `Alertmanager` resource, which allows
|
||||
users to declaratively describe an Alertmanager cluster. To successfully deploy
|
||||
an Alertmanager cluster, it is important to understand the contract between
|
||||
Prometheus and Alertmanager. Alertmanager is used to:
|
||||
|
||||
* Deduplicate alerts received from Prometheus.
|
||||
* Silence alerts.
|
||||
* Route and send grouped notifications to various integrations (PagerDuty, OpsGenie, mail, chat, ...).
|
||||
|
||||
The Prometheus Operator also introduces an `AlertmanagerConfig` resource, which
|
||||
allows users to declaratively describe Alertmanager configurations.
|
||||
|
||||
> Note: The AlertmanagerConfig resource is currently v1alpha1, testing and feedback are welcome.
|
||||
In this guide, we'll explore the various methods for managing Alertmanager configurations within your Kubernetes cluster.
|
||||
|
||||
Prometheus' configuration also includes "rule files", which contain the
|
||||
[alerting
|
||||
rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/).
|
||||
When an alerting rule triggers, it fires that alert against *all* Alertmanager
|
||||
instances, on *every* rule evaluation interval. The Alertmanager instances
|
||||
When an alerting rule is triggered, it fires that alert to ***all*** Alertmanager
|
||||
instances, on ***every*** rule evaluation interval. The Alertmanager instances
|
||||
communicate to each other which notifications have already been sent out. For
|
||||
more information on this system design, see the [High Availability]({{< ref "high-availability" >}})
|
||||
page.
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
You have a running Prometheus operator.
|
||||
|
||||
## Deploying Alertmanager
|
||||
|
||||
First, let's create a Alertmanager cluster with three replicas:
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/alerting/alertmanager-example.yaml"
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Alertmanager
|
||||
metadata:
|
||||
name: example
|
||||
spec:
|
||||
replicas: 3
|
||||
```
|
||||
|
||||
Wait for all Alertmanager pods to be ready:
|
||||
|
||||
```bash
|
||||
kubectl get pods -l alertmanager=example -w
|
||||
```
|
||||
|
||||
## Managing Alertmanager configuration
|
||||
|
||||
By default, the Alertmanager instances will start with a minimal configuration
|
||||
which isn't really useful since it doesn't send any notification when receiving
|
||||
alerts.
|
||||
|
||||
You have several options to provide the [Alertmanager configuration](https://prometheus.io/docs/alerting/configuration/):
|
||||
1. You can use a native Alertmanager configuration file stored in a Kubernetes secret.
|
||||
2. You can use `spec.alertmanagerConfiguration` to reference an
|
||||
AlertmanagerConfig object in the same namespace which defines the main
|
||||
1. Using a native Alertmanager configuration file stored in a [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/).
|
||||
2. using `spec.alertmanagerConfiguration` to reference an
|
||||
`AlertmanagerConfig` object in the same namespace which defines the main
|
||||
Alertmanager configuration.
|
||||
3. You can define `spec.alertmanagerConfigSelector` and
|
||||
3. Using `spec.alertmanagerConfigSelector` and
|
||||
`spec.alertmanagerConfigNamespaceSelector` to tell the operator which
|
||||
AlertmanagerConfigs objects should be selected and merged with the main
|
||||
`AlertmanagerConfig` objects should be selected and merged with the main
|
||||
Alertmanager configuration.
|
||||
|
||||
### Using a Kubernetes Secret
|
||||
|
@ -205,72 +164,6 @@ will be a global AlertmanagerConfig. When the operator generates the
|
|||
Alertmanager configuration from it, the namespace label will not be enforced
|
||||
for routes and inhibition rules.
|
||||
|
||||
## Exposing the Alertmanager service
|
||||
|
||||
To access the Alertmanager interface, you have to expose the service to the outside. For
|
||||
simplicity, we use a `NodePort` Service.
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/alerting/alertmanager-example-service.yaml"
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: alertmanager-example
|
||||
spec:
|
||||
type: NodePort
|
||||
ports:
|
||||
- name: web
|
||||
nodePort: 30903
|
||||
port: 9093
|
||||
protocol: TCP
|
||||
targetPort: web
|
||||
selector:
|
||||
alertmanager: example
|
||||
```
|
||||
|
||||
Once the Service is created, the Alertmanager web server is available under the
|
||||
node's IP address on port `30903`.
|
||||
|
||||
> Note: Exposing the Alertmanager web server this way may not be an applicable solution. Read more about the possible options in the [Ingress guide](exposing-prometheus-and-alertmanager.md).
|
||||
|
||||
## Integrating with Prometheus
|
||||
|
||||
### Configuring Alertmanager in Prometheus
|
||||
|
||||
This Alertmanager cluster is now fully functional and highly available, but no
|
||||
alerts are fired against it.
|
||||
|
||||
First, create a Prometheus instance that will send alerts to the Alertmanger cluster:
|
||||
|
||||
```yaml mdox-exec="cat example/user-guides/alerting/prometheus-example.yaml"
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: example
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
||||
replicas: 2
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- namespace: default
|
||||
name: alertmanager-example
|
||||
port: web
|
||||
serviceMonitorSelector:
|
||||
matchLabels:
|
||||
team: frontend
|
||||
ruleSelector:
|
||||
matchLabels:
|
||||
role: alert-rules
|
||||
prometheus: example
|
||||
```
|
||||
|
||||
The `Prometheus` resource discovers all of the Alertmanager instances behind
|
||||
the `Service` created before (pay attention to `name`, `namespace` and `port`
|
||||
fields which should match with the definition of the Alertmanager Service).
|
||||
|
||||
Open the Prometheus web interface, go to the "Status > Runtime & Build
|
||||
Information" page and check that the Prometheus has discovered 3 Alertmanager
|
||||
instances.
|
||||
|
||||
### Deploying Prometheus Rules
|
||||
|
||||
The `PrometheusRule` CRD allows to define alerting and recording rules. The
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 202
|
||||
weight: 203
|
||||
toc: true
|
||||
title: Prometheus Agent
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 207
|
||||
weight: 208
|
||||
toc: true
|
||||
title: Storage
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 208
|
||||
weight: 209
|
||||
toc: true
|
||||
title: Strategic Merge Patch
|
||||
menu:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
weight: 201
|
||||
weight: 202
|
||||
toc: true
|
||||
title: Admission webhook
|
||||
menu:
|
||||
|
|
6
example/user-guides/getting-started/prometheus.yaml
Normal file
6
example/user-guides/getting-started/prometheus.yaml
Normal file
|
@ -0,0 +1,6 @@
|
|||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: Prometheus
|
||||
metadata:
|
||||
name: prometheus
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
Loading…
Add table
Add a link
Reference in a new issue