1
0
Fork 0
mirror of https://github.com/prometheus-operator/prometheus-operator.git synced 2025-04-16 09:16:38 +00:00

reloader: don't fail on envvar expansion errors

Refer: https://github.com/thanos-io/thanos/pull/7429
Fixes: https://github.com/prometheus-operator/prometheus-operator/issues/6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
This commit is contained in:
Pranshu Srivastava 2024-06-24 01:17:45 +05:30
parent 2c1cba5b8b
commit 0002aace2d
No known key found for this signature in database
GPG key ID: 63938388A4528764
8 changed files with 21 additions and 20 deletions

View file

@ -129,7 +129,7 @@ We intentionally don't want to spin up new instances while others that are marke
Prometheus Agents are different than servers since queries are not available in this mode. Their only responsibility is scraping metrics and pushing them via remote-write to a long-term storage backend, making the scale-down experience much easier to handle.
When receiving the SIGTERM signal, the Prometheus Agent should gracefully handle the signal by finishing all remote-write queues before ending the process. Prometheus-Operator, by default, adjusts the [Graceful Termination Period](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) of Prometheus/PrometheusAgent pods to 600s. Ten minutes should be enough for them to flush the remote-write queue, but, if needed, users can redefine Graceful Termination Period using [Strategic Merge Patch](https://prometheus-operator.dev/docs/operator/strategic-merge-patch/).
When receiving the SIGTERM signal, the Prometheus Agent should gracefully handle the signal by finishing all remote-write queues before ending the process. Prometheus-Operator, by default, adjusts the [Graceful Termination Period](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) of Prometheus/PrometheusAgent pods to 600s. Ten minutes should be enough for them to flush the remote-write queue, but, if needed, users can redefine Graceful Termination Period using [Strategic Merge Patch](https://prometheus-operator.dev/docs/platform/strategic-merge-patch/).
Since there's no use case for retaining Prometheus Agents, its CRD will not be extended with the `RetentionPolicy` mentioned in [Graceful scale-down of Prometheus Servers](#graceful-scale-down-of-prometheus-servers)

View file

@ -64,7 +64,7 @@ Currently, we already have a PrometheusAgent CRD that supports StatefulSet deplo
The reason for enhancing existing CRD (instead of introducing a new CRD) is it would take less time to finish the MVP. Well let users experiment with the MVP, and in case users report a separate CRD is needed, well separate the logic of DaemonSet deployment into a new CRD later.
The current [PrometheusAgent CRD](https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1alpha1.PrometheusAgent) already has sufficient fields for the DaemonSet deployment. The DaemonSet deployment can use all the existing fields in the CRD except the ones related to:
The current [PrometheusAgent CRD](https://prometheus-operator.dev/docs/platform/prometheus-agent/) already has sufficient fields for the DaemonSet deployment. The DaemonSet deployment can use all the existing fields in the CRD except the ones related to:
* Selectors for service, probe, ScrapeConfig
* Replica
* Shard

View file

@ -22,7 +22,7 @@ After we have a proper structure, it will become relatively easy to add informat
A good documentation is one that is easy to understand for a newcomer and provides the exact amount of information that is needed according to the need. But looking at the current documentation structure, a lot of topics seem misplaced. For example, there is no need for a **"Contributing"** page in the prologue section. Prologue section should only give the introduction and the prerequisites for the project. Due to this, a user might need to put more effort to search for relevant information and this might decrease the user's productivity.
Currently, there is some unnecessary information from the documentation creating misconceptions in the user's mind. For example, let us look at [#6046](https://github.com/prometheus-operator/prometheus-operator/issues/6046) which tells us that there is no Ingress Guide present in the current documentation. But, if we look at the website, there are links to the **“Ingress Guide”** on the [Getting-Started](https://prometheus-operator.dev/docs/user-guides/getting-started/#exposing-the-prometheus-service) and [Alerting page](https://prometheus-operator.dev/docs/user-guides/alerting/#exposing-the-alertmanager-service) in **User-Guide**. Due to this, many users will report the same issue and it will take time for maintainers to resolve them.
Currently, there is some unnecessary information from the documentation creating misconceptions in the user's mind. For example, let us look at [#6046](https://github.com/prometheus-operator/prometheus-operator/issues/6046) which tells us that there is no Ingress Guide present in the current documentation. But, if we look at the website, there are links to the **“Ingress Guide”** on the [Getting-Started](https://prometheus-operator.dev/docs/developer/getting-started/#exposing-the-prometheus-service) and [Alerting page](https://prometheus-operator.dev/docs/developer/alerting/#exposing-the-alertmanager-service) in **User-Guide**. Due to this, many users will report the same issue and it will take time for maintainers to resolve them.
Incorporation of new topics is also difficult if the structure is not up to mark because more time and effort is needed to decide the best place to add a topic which can often lead to decrease in productivity of a maintainer. For example, in issue [#3553](https://github.com/prometheus-operator/prometheus-operator/issues/3553#issuecomment-726733177), it has been mentioned that basic architecture needs to be worked upon before adding the diagram as there is no section talking about **“namespace selection”** in the current documentation.

View file

@ -43,7 +43,7 @@ to generate scrape configurations.
* `kubernetes_sd`
* `consul_sd`
The following examples are basic and don't cover all the supported service discovery mechanisms. The CRD is constantly evolving, adding new features and support for new Service Discoveries. Check the [API documentation](https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1alpha1.ScrapeConfig) to see all supported fields.
The following examples are basic and don't cover all the supported service discovery mechanisms. The CRD is constantly evolving, adding new features and support for new Service Discoveries. Check the [API documentation](https://prometheus-operator.dev/docs/developer/scrapeconfig/) to see all supported fields.
If you have an interest in another service discovery mechanism or you see something missing in the implementation, please
[open an issue](https://github.com/prometheus-operator/prometheus-operator/issues).

View file

@ -95,7 +95,7 @@ The Prometheus operator automatically detects changes in the Kubernetes API serv
matching deployments and configurations are kept in sync.
To learn more about the CRDs introduced by the Prometheus Operator have a look
at the [design](https://prometheus-operator.dev/docs/operator/design/) page.
at the [design](https://prometheus-operator.dev/docs/getting-started/design/) page.
## Dynamic Admission Control
@ -103,7 +103,7 @@ To prevent invalid Prometheus alerting and recording rules from causing failures
an [admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
is provided to validate `PrometheusRule` resources upon initial creation or update.
For more information on this feature, see the [user guide](https://prometheus-operator.dev/docs/user-guides/webhook/).
For more information on this feature, see the [user guide](https://prometheus-operator.dev/docs/platform/webhook/).
## Quickstart

View file

@ -157,12 +157,13 @@ func main() {
{
opts := reloader.Options{
CfgFile: *cfgFile,
CfgOutputFile: *cfgSubstFile,
WatchedDirs: *watchedDir,
DelayInterval: *delayInterval,
WatchInterval: *watchInterval,
RetryInterval: *retryInterval,
CfgFile: *cfgFile,
CfgOutputFile: *cfgSubstFile,
WatchedDirs: *watchedDir,
DelayInterval: *delayInterval,
WatchInterval: *watchInterval,
RetryInterval: *retryInterval,
TolerateEnvVarExpansionErrors: true,
}
switch *reloadMethod {

4
go.mod
View file

@ -29,7 +29,7 @@ require (
github.com/prometheus/exporter-toolkit v0.11.0
github.com/prometheus/prometheus v0.53.0
github.com/stretchr/testify v1.9.0
github.com/thanos-io/thanos v0.35.1
github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc
go.uber.org/automaxprocs v1.5.3
golang.org/x/exp v0.0.0-20240613232115-7f521ea00fb8
golang.org/x/net v0.26.0
@ -101,7 +101,7 @@ require (
github.com/josharian/intern v1.0.0 // indirect
github.com/jpillora/backoff v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/cpuid/v2 v2.2.5 // indirect
github.com/klauspost/cpuid/v2 v2.2.8 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/metalmatze/signal v0.0.0-20210307161603-1c9aa721a97a // indirect
github.com/minio/sha256-simd v1.0.1 // indirect

12
go.sum
View file

@ -265,10 +265,10 @@ github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7V
github.com/julienschmidt/httprouter v1.3.0/go.mod h1:JR6WtHb+2LUe8TCKY3cZOxFyyO8IZAc4RVcycCCAKdM=
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.17.8 h1:YcnTYrq7MikUT7k0Yb5eceMmALQPYBW/Xltxn0NAMnU=
github.com/klauspost/compress v1.17.8/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
github.com/klauspost/cpuid/v2 v2.2.5 h1:0E5MSMDEoAulmXNFquVs//DdoomxaoTY1kUhbc/qbZg=
github.com/klauspost/cpuid/v2 v2.2.5/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
github.com/klauspost/compress v1.17.9 h1:6KIumPrER1LHsvBVuDa0r5xaG0Es51mhhB9BQB2qeMA=
github.com/klauspost/compress v1.17.9/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
github.com/klauspost/cpuid/v2 v2.2.8 h1:+StwCXwm9PdpiEkPyzBXIy+M9KUb4ODm0Zarf1kS5BM=
github.com/klauspost/cpuid/v2 v2.2.8/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
github.com/konsorten/go-windows-terminal-sequences v1.0.3/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc=
@ -399,8 +399,8 @@ github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/thanos-io/thanos v0.35.1 h1:j07RPGjAe0Bhe5ceO0mSRetdkCxzCznJXXRdQqGGyao=
github.com/thanos-io/thanos v0.35.1/go.mod h1:WHGZyM/qwp857mJr8Q0d7K6eQoLtLv+6p7RNpT/yeIE=
github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc h1:Bcc0WmbYgJ3r7jy3zDHJBC0IK7Sn9Yzt+PvbbqT94XM=
github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc/go.mod h1:f7LiW4+/xvV5+gkseMuVbQnrbFTFnCPv5+X1M6mXkn4=
github.com/xhit/go-str2duration/v2 v2.1.0 h1:lxklc02Drh6ynqX+DdPyp5pCKLUQpRT8bp8Ydu2Bstc=
github.com/xhit/go-str2duration/v2 v2.1.0/go.mod h1:ohY8p+0f07DiV6Em5LKB0s2YpLtXVyJfNt1+BlmyAsU=
github.com/yuin/goldmark v1.1.25/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=