prometheus-operator

mirror of https://github.com/prometheus-operator/prometheus-operator.git synced 2025-04-16 01:06:27 +00:00

Author	SHA1	Message	Date
paulfantom	35b2954459	pkg/prometheus: remove liveness probe Removing liveness probe to prevent killing prometheus pod during WAL replay. This should be reverted around kubernetes 1.21 release. At that point startupProbe should be added.	2020-09-15 12:05:18 +02:00
Simon Pasquier	675d303ee0	pkg/prometheus: enable Thanos uploads only when needed (#3485 ) When the Thanos spec doesn't configure object storage, there's no need to configure the Thanos sidecar for block uploads and mount the Prometheus data volume. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-09-11 16:16:19 +02:00
Sergiusz Urbaniak	289ee029ef	Merge pull request #3440 from s-urbaniak/remove-mlw remove multilistwatcher and denylistfilter	2020-09-08 07:34:39 +02:00
Sergiusz Urbaniak	c786d8ef2e	pkg/informers: add mising godoc	2020-09-07 15:24:18 +02:00
Sergiusz Urbaniak	34ba8237f5	pkg/informers: fix stylistic nits Co-authored-by: Simon Pasquier <spasquie@redhat.com>	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	4f36b38e6c	pkg/informers: add unit tests	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	badeafdc36	pkg/informers: add godoc	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	5e94344182	pkg/listwatch: remove multilistwatcher	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	2379f59f6f	pkg/prometheus: check error immediately after List	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	27c1680975	pkg/*: renamings and reformatting	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	0c9283465a	pkg/thanos: remove multilistwatcher	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	920f2490d9	pkg/alertmanager: remove multlistwatcher	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	e9ad330bf8	pkg/prometheus: remove multilistwatcher	2020-09-04 17:08:33 +02:00
Sergiusz Urbaniak	f22fd2c7c0	pkg/listwach: remove denylist ListerWatcher	2020-09-04 16:58:51 +02:00
Sergiusz Urbaniak	54bbe620bb	pkg/informers: initial commit	2020-09-04 16:58:51 +02:00
Simon Pasquier	3b2e17d714	Instrument client-go requests This change adds 3 metrics tracking client-go requests to the Kubernetes API: * `prometheus_operator_kubernetes_client_http_requests_total`, counter with a `status_code` label. * `prometheus_operator_kubernetes_client_http_request_duration_seconds`, summary with a `endpoint` label. * `prometheus_operator_kubernetes_client_rate_limiter_duration_seconds`, summary with a `endpoint` label. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-09-04 16:03:13 +02:00
Simon Pasquier	053da63f0b	*: pass context.Context to client-go functions Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-09-03 14:13:31 +02:00
Sergiusz Urbaniak	d1e9fc77e2	Merge pull request #3395 from matthiasr/mr/pkg-monitoring Break the API types out into their own module	2020-09-02 09:35:50 +02:00
Sergiusz Urbaniak	909fc64585	Merge pull request #3445 from simonpasquier/fix-3327 pkg/prometheus: skip invalid service monitors	2020-08-31 16:56:45 +02:00
Sergiusz Urbaniak	608be1baec	Merge pull request #3436 from hwoarang/add-cluster-reconnect-timeout pkg/alertmanager: Use lower value for --cluster.reconnect-timeout	2020-08-31 15:39:11 +02:00
Simon Pasquier	7ed47043ce	Add tests for assetStore Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-08-31 14:51:30 +02:00
Simon Pasquier	a0a1816f4c	Use cache.Store instead of custom stores	2020-08-31 10:51:09 +02:00
Simon Pasquier	caf6b9f3ce	pkg/prometheus: skip invalid service monitors Previously the operator would fail the reconciliation when a service monitor was referencing a bad secret or configmap (either the object didn't exist or the key was missing). With this change, the operator will skip these service monitors. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-08-31 10:51:09 +02:00
Matthias Rampke	2a67feba74	Break the API types out into their own module This allows others to import them without incurring all the dependencies of the operator transitively, and avoid version conflicts with other dependencies as much as possible. Fixed #3097. Signed-off-by: Matthias Rampke <matthias@rampke.de>	2020-08-28 13:41:46 +00:00
Matthias Rampke	76d5211a6c	Avoid CI timeouts in TestConfigGeneration (#3432 ) This test generates the same configuration many times, for each Prometheus version, to see if it is deterministic. As the compatibility matrix grows, test times increase. Now, this sometimes fails in CI because Travis kills jobs after 10 minutes of no output. Run each version as a subtest, and run tests with `-v`, so that output is produced after each version. This avoids the no-output timeout. Parallelize testing for each Prometheus version. When the tests are run with `-short` (as in `make test-unit`), only try one hundred iterations. With the race detector on, as in that target, this takes around 5 seconds. Without the race detector, short tests on this package now run quick enough for fast iteration in an IDE. Add an additional target and Travis job for running the long tests, but without the race detector. This brings the run time for the full 1000 iterations per version to under a minute. Signed-off-by: Matthias Rampke <matthias@rampke.de>	2020-08-28 14:53:32 +02:00
Markos Chandras	86102e73e9	pkg/alertmanager: Use lower value for --cluster.reconnect-timeout Alertmanager in cluster mode resolves the DNS name of each peer and caches its IP address which uses on regular intervals to 'refresh' the connection. In high-dynamic environment like kubernetes, it's possible that alertmanager pods come and go on frequent intervals. The default timeout value of 6h is not suitable in that case as alertmanager will keep trying to reconnect to a non-existing pod over and over until it gives up and remove that peer from the member list. During this period of time, the cluster is reported to be in a degraded state due to the missing member. As such, it's best to use a lower value which will allow the alertmanager to remove the pod from the list of peers soon after it disappears. Related: https://github.com/prometheus/alertmanager/issues/2250	2020-08-26 13:02:35 +03:00
Simon Pasquier	0811e8f65c	pkg/alertmanager: cleanup resources via OwnerReferences The Alertmanager controller deleted dependent resources manually while prometheus and thanos rely on Kubernetes to do the work using OwnerReferences. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-08-20 16:21:07 +02:00
Simon Pasquier	e64718cb6b	pkg: add prometheus_operator_reconcile_operations_total metric (#3415 ) * pkg: add prometheus_operator_reconcile_operations_total metric We already have the `prometheus_operator_reconcile_errors_total` metric to track the number of reconciliation attempts that failed but we miss the number of attempts which makes it harder to alert on it. With this change, we can compute the ratio of reconciliations that failed. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Update alert definition with new metric	2020-08-19 16:41:02 +02:00
Matthias Rampke	5c1f668c97	Fix validation logic for SecretOrConfigMap This was flagged by [golangci-lint](https://staticcheck.io/docs/checks#SA4022). The check was for the address of the pointer, not the value. Add a test (failing on master) to verify this, and fix the validation logic. Follow-up to #2716. Signed-off-by: Matthias Rampke <matthias@rampke.de>	2020-08-17 11:50:59 +00:00
Sergiusz Urbaniak	54704fac8f	Merge pull request #3392 from lilic/fix-image-tag-version pkg/operator/image.go: Adjust image path building	2020-08-11 10:43:24 +02:00
Lili Cosic	7b4a9d740d	pkg/prometheus/statefulset_test.go: Adjust tests	2020-08-10 14:49:55 +02:00
Lili Cosic	49e2842c49	pkg/alertmanager,thanos,prometheus: Adjust usage	2020-08-10 14:49:55 +02:00
Lili Cosic	caed11f835	pkg/operator/image.go: Adjust image path building Image can contain tag already, this checks if it does it just returns the image. Otherwise sha/digest takes priority over tag and lastly version is taken due to historic reasons.	2020-08-10 14:49:55 +02:00
郑佳金	d90df0a0e7	make generate	2020-08-06 16:59:29 +08:00
郑佳金	9c066705a4	feat: support special post alerts timeout	2020-08-06 16:59:15 +08:00
paulfantom	67780ccc45	repository migration to prometheus-operator organization	2020-08-05 13:13:46 +02:00
Sören Jentzsch	7778fe0239	Allow for enabling Alertmanager HA cluster mode even when running with single replica, via newly introduced forceEnableClusterMode flag. With #3196 we lost the possibility to setup Alertmanager clusters with a single replica across multiple Kubernetes clusters. Fixes #3337	2020-08-04 01:36:10 +02:00
Frederic Branczyk	ef0bc1c45a	Merge pull request #3377 from coderanger/patch-1 🐛 Don't overwrite __param_target	2020-08-03 11:11:06 +02:00
Noah Kantrowitz	41c2202698	🐛 Don't overwrite __param_target It is already set above using the sd metadata, no need to overwrite it back to __address__.	2020-08-01 23:15:58 -07:00
Frederic Branczyk	6c8f7fa6b6	Merge pull request #3374 from vincent-pli/clearify-targetport-servicemonitor Clarify targetPort in endpoint	2020-07-31 11:26:17 +02:00
pengli	2ebe8247d8	Clarify targetPort in endpoint	2020-07-30 17:39:15 -07:00
Lili Cosic	8f49757672	pkg/listwatch: Change to accept single instance of rvs	2020-07-29 11:46:06 +02:00
Michal Fojtik	7bbd81692a	listwatch: do not duplicate resource versions	2020-07-29 11:45:59 +02:00
Frederic Branczyk	f6b342d3f7	Merge pull request #3364 from coreos/revert-3308-normalize-default-durations Revert "Normalize default durations"	2020-07-27 11:38:28 +02:00
Frederic Branczyk	f1e0131c1b	Merge pull request #3358 from jbfavre/fix_prometheus_version_propagation Propagate Prometheus image version to statefulset	2020-07-27 11:20:16 +02:00
Frederic Branczyk	024da7b667	Fix expected default probe scrape interval	2020-07-27 10:29:46 +02:00
Frederic Branczyk	1d00eeb962	Revert "Normalize default durations"	2020-07-27 07:42:21 +02:00
Jean-Baptiste Favre	c710ec3e39	Fix Go gormat	2020-07-24 14:14:13 +02:00
Jean-Baptiste Favre	dc2a4527c2	Improve unit tests for Version, Tag & SHA matrix	2020-07-24 14:07:58 +02:00
Simon Pasquier	2021270248	pkg: instrument resources being tracked by the operator This change adds a new `prometheus_operator_resources` metric that keeps track of the number of resources currently managed by the operator. The metric is broken down by controller and type of resource. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-07-24 13:39:01 +02:00

1 2 3 4 5 ...

990 commits