reloader: don't fail on envvar expansion errors

Refer: https://github.com/thanos-io/thanos/pull/7429 Fixes: https://github.com/prometheus-operator/prometheus-operator/issues/6136 Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
2025-04-16 09:16:38 +00:00 · 2024-06-24 01:17:45 +05:30 · 2024-06-24 01:17:45 +05:30 · 0002aace2d
commit 0002aace2d
parent 2c1cba5b8b
8 changed files with 21 additions and 20 deletions
--- a/Documentation/proposals/202310-shard-autoscaling.md
+++ b/Documentation/proposals/202310-shard-autoscaling.md
@ -129,7 +129,7 @@ We intentionally don't want to spin up new instances while others that are marke

 Prometheus Agents are different than servers since queries are not available in this mode. Their only responsibility is scraping metrics and pushing them via remote-write to a long-term storage backend, making the scale-down experience much easier to handle.

-When receiving the SIGTERM signal, the Prometheus Agent should gracefully handle the signal by finishing all remote-write queues before ending the process. Prometheus-Operator, by default, adjusts the [Graceful Termination Period](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) of Prometheus/PrometheusAgent pods to 600s. Ten minutes should be enough for them to flush the remote-write queue, but, if needed, users can redefine Graceful Termination Period using [Strategic Merge Patch](https://prometheus-operator.dev/docs/operator/strategic-merge-patch/).
+When receiving the SIGTERM signal, the Prometheus Agent should gracefully handle the signal by finishing all remote-write queues before ending the process. Prometheus-Operator, by default, adjusts the [Graceful Termination Period](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) of Prometheus/PrometheusAgent pods to 600s. Ten minutes should be enough for them to flush the remote-write queue, but, if needed, users can redefine Graceful Termination Period using [Strategic Merge Patch](https://prometheus-operator.dev/docs/platform/strategic-merge-patch/).

 Since there's no use case for retaining Prometheus Agents, its CRD will not be extended with the `RetentionPolicy` mentioned in [Graceful scale-down of Prometheus Servers](#graceful-scale-down-of-prometheus-servers)

--- a/Documentation/proposals/202405-agent-daemonset.md
+++ b/Documentation/proposals/202405-agent-daemonset.md
@ -64,7 +64,7 @@ Currently, we already have a PrometheusAgent CRD that supports StatefulSet deplo

 The reason for enhancing existing CRD (instead of introducing a new CRD) is it would take less time to finish the MVP. We’ll let users experiment with the MVP, and in case users report a separate CRD is needed, we’ll separate the logic of DaemonSet deployment into a new CRD later.

-The current [PrometheusAgent CRD](https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1alpha1.PrometheusAgent) already has sufficient fields for the DaemonSet deployment. The DaemonSet deployment can use all the existing fields in the CRD except the ones related to:
+The current [PrometheusAgent CRD](https://prometheus-operator.dev/docs/platform/prometheus-agent/) already has sufficient fields for the DaemonSet deployment. The DaemonSet deployment can use all the existing fields in the CRD except the ones related to:
 * Selectors for service, probe, ScrapeConfig
 * Replica
 * Shard
--- a/Documentation/proposals/202405-docs-restructuring.md
+++ b/Documentation/proposals/202405-docs-restructuring.md
@ -22,7 +22,7 @@ After we have a proper structure, it will become relatively easy to add informat

 A good documentation is one that is easy to understand for a newcomer and provides the exact amount of information that is needed according to the need. But looking at the current documentation structure, a lot of topics seem misplaced. For example, there is no need for a **"Contributing"** page in the prologue section. Prologue section should only give the introduction and the prerequisites for the project. Due to this, a user might need to put more effort to search for relevant information and this might decrease the user's productivity.

-Currently, there is some unnecessary information from the documentation creating misconceptions in the user's mind. For example, let us look at [#6046](https://github.com/prometheus-operator/prometheus-operator/issues/6046) which tells us that there is no Ingress Guide present in the current documentation. But, if we look at the website, there are links to the **“Ingress Guide”** on the [Getting-Started](https://prometheus-operator.dev/docs/user-guides/getting-started/#exposing-the-prometheus-service) and [Alerting page](https://prometheus-operator.dev/docs/user-guides/alerting/#exposing-the-alertmanager-service) in **User-Guide**. Due to this, many users will report the same issue and it will take time for maintainers to resolve them.
+Currently, there is some unnecessary information from the documentation creating misconceptions in the user's mind. For example, let us look at [#6046](https://github.com/prometheus-operator/prometheus-operator/issues/6046) which tells us that there is no Ingress Guide present in the current documentation. But, if we look at the website, there are links to the **“Ingress Guide”** on the [Getting-Started](https://prometheus-operator.dev/docs/developer/getting-started/#exposing-the-prometheus-service) and [Alerting page](https://prometheus-operator.dev/docs/developer/alerting/#exposing-the-alertmanager-service) in **User-Guide**. Due to this, many users will report the same issue and it will take time for maintainers to resolve them.

 Incorporation of new topics is also difficult if the structure is not up to mark because more time and effort is needed to decide the best place to add a topic which can often lead to decrease in productivity of a maintainer. For example, in issue [#3553](https://github.com/prometheus-operator/prometheus-operator/issues/3553#issuecomment-726733177), it has been mentioned that basic architecture needs to be worked upon before adding the diagram as there is no section talking about **“namespace selection”** in the current documentation.

--- a/Documentation/user-guides/scrapeconfig.md
+++ b/Documentation/user-guides/scrapeconfig.md
@ -43,7 +43,7 @@ to generate scrape configurations.
 * `kubernetes_sd`
 * `consul_sd`

-The following examples are basic and don't cover all the supported service discovery mechanisms. The CRD is constantly evolving, adding new features and support for new Service Discoveries. Check the [API documentation](https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1alpha1.ScrapeConfig) to see all supported fields.
+The following examples are basic and don't cover all the supported service discovery mechanisms. The CRD is constantly evolving, adding new features and support for new Service Discoveries. Check the [API documentation](https://prometheus-operator.dev/docs/developer/scrapeconfig/) to see all supported fields.

 If you have an interest in another service discovery mechanism or you see something missing in the implementation, please
 [open an issue](https://github.com/prometheus-operator/prometheus-operator/issues).
--- a/README.md
+++ b/README.md
@ -95,7 +95,7 @@ The Prometheus operator automatically detects changes in the Kubernetes API serv
 matching deployments and configurations are kept in sync.

 To learn more about the CRDs introduced by the Prometheus Operator have a look
-at the [design](https://prometheus-operator.dev/docs/operator/design/) page.
+at the [design](https://prometheus-operator.dev/docs/getting-started/design/) page.

 ## Dynamic Admission Control

@ -103,7 +103,7 @@ To prevent invalid Prometheus alerting and recording rules from causing failures
 an [admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
 is provided to validate `PrometheusRule` resources upon initial creation or update.

-For more information on this feature, see the [user guide](https://prometheus-operator.dev/docs/user-guides/webhook/).
+For more information on this feature, see the [user guide](https://prometheus-operator.dev/docs/platform/webhook/).

 ## Quickstart

--- a/cmd/prometheus-config-reloader/main.go
+++ b/cmd/prometheus-config-reloader/main.go
@ -157,12 +157,13 @@ func main() {

 	{
 		opts := reloader.Options{
-			CfgFile:       *cfgFile,
-			CfgOutputFile: *cfgSubstFile,
-			WatchedDirs:   *watchedDir,
-			DelayInterval: *delayInterval,
-			WatchInterval: *watchInterval,
-			RetryInterval: *retryInterval,
+			CfgFile:                       *cfgFile,
+			CfgOutputFile:                 *cfgSubstFile,
+			WatchedDirs:                   *watchedDir,
+			DelayInterval:                 *delayInterval,
+			WatchInterval:                 *watchInterval,
+			RetryInterval:                 *retryInterval,
+			TolerateEnvVarExpansionErrors: true,
 		}

 		switch *reloadMethod {
--- a/go.mod
+++ b/go.mod
@ -29,7 +29,7 @@ require (
 	github.com/prometheus/exporter-toolkit v0.11.0
 	github.com/prometheus/prometheus v0.53.0
 	github.com/stretchr/testify v1.9.0
-	github.com/thanos-io/thanos v0.35.1
+	github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc
 	go.uber.org/automaxprocs v1.5.3
 	golang.org/x/exp v0.0.0-20240613232115-7f521ea00fb8
 	golang.org/x/net v0.26.0
@ -101,7 +101,7 @@ require (
 	github.com/josharian/intern v1.0.0 // indirect
 	github.com/jpillora/backoff v1.0.0 // indirect
 	github.com/json-iterator/go v1.1.12 // indirect
-	github.com/klauspost/cpuid/v2 v2.2.5 // indirect
+	github.com/klauspost/cpuid/v2 v2.2.8 // indirect
 	github.com/mailru/easyjson v0.7.7 // indirect
 	github.com/metalmatze/signal v0.0.0-20210307161603-1c9aa721a97a // indirect
 	github.com/minio/sha256-simd v1.0.1 // indirect
--- a/go.sum
+++ b/go.sum
@ -265,10 +265,10 @@ github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7V
 github.com/julienschmidt/httprouter v1.3.0/go.mod h1:JR6WtHb+2LUe8TCKY3cZOxFyyO8IZAc4RVcycCCAKdM=
 github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
 github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
-github.com/klauspost/compress v1.17.8 h1:YcnTYrq7MikUT7k0Yb5eceMmALQPYBW/Xltxn0NAMnU=
-github.com/klauspost/compress v1.17.8/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
-github.com/klauspost/cpuid/v2 v2.2.5 h1:0E5MSMDEoAulmXNFquVs//DdoomxaoTY1kUhbc/qbZg=
-github.com/klauspost/cpuid/v2 v2.2.5/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
+github.com/klauspost/compress v1.17.9 h1:6KIumPrER1LHsvBVuDa0r5xaG0Es51mhhB9BQB2qeMA=
+github.com/klauspost/compress v1.17.9/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
+github.com/klauspost/cpuid/v2 v2.2.8 h1:+StwCXwm9PdpiEkPyzBXIy+M9KUb4ODm0Zarf1kS5BM=
+github.com/klauspost/cpuid/v2 v2.2.8/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
 github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
 github.com/konsorten/go-windows-terminal-sequences v1.0.3/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
 github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc=
@ -399,8 +399,8 @@ github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5
 github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
 github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
 github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
-github.com/thanos-io/thanos v0.35.1 h1:j07RPGjAe0Bhe5ceO0mSRetdkCxzCznJXXRdQqGGyao=
-github.com/thanos-io/thanos v0.35.1/go.mod h1:WHGZyM/qwp857mJr8Q0d7K6eQoLtLv+6p7RNpT/yeIE=
+github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc h1:Bcc0WmbYgJ3r7jy3zDHJBC0IK7Sn9Yzt+PvbbqT94XM=
+github.com/thanos-io/thanos v0.0.0-20240702084127-fcc88c028acc/go.mod h1:f7LiW4+/xvV5+gkseMuVbQnrbFTFnCPv5+X1M6mXkn4=
 github.com/xhit/go-str2duration/v2 v2.1.0 h1:lxklc02Drh6ynqX+DdPyp5pCKLUQpRT8bp8Ydu2Bstc=
 github.com/xhit/go-str2duration/v2 v2.1.0/go.mod h1:ohY8p+0f07DiV6Em5LKB0s2YpLtXVyJfNt1+BlmyAsU=
 github.com/yuin/goldmark v1.1.25/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=