[go: nahoru, domu]

Page MenuHomePhabricator

Reload thanos-rule on new pyrra rules deployed
Closed, ResolvedPublic

Description

Filing from IRC

09:04  <godog> herron: looks like thanos-rule on titan2001 never got reloaded
09:10  <elukey> sigh
09:10  <elukey> Maybe we could add a notify in puppet so when pyrra reloads it will signal thanos-rule as well?
09:23  <elukey> reloaded thanos-rule on 2001
09:26  <godog> yes that's what needs to happen, for some reason I thought that was the case already

Event Timeline

In theory pyrra should do the right thing, but we're running into a couple issues in our current deployment:

  • The pyrra filesystem operator watches /etc/pyrra/config and will automatically pick up new yaml files placed there, however in practice this errors when puppet places the temporary file e.g. pyrra-filesystem[2200461]: msg="ignoring non YAML file" file=/etc/pyrra/config/varnish-requests.yaml20231219
  • Output-rules are automatically generated by the filesystem operator, and the filesystem operator attempts to reload prometheus but as we're using thanos it fails (which is where the above issue with upstream comes in)

Having puppet issue reloads of pyrra-filesystem and thanos-rule when puppet deploys a config change should help. Only concern would be the thanos-rule reload racing against pyrra generation of the output rules, but lets try it and see how it goes

Change 984220 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] pyrra: reload pyrra-filesystem and thanos-rule on cfg change

https://gerrit.wikimedia.org/r/984220

Change 984220 merged by Herron:

[operations/puppet@production] pyrra: reload pyrra-filesystem and thanos-rule on cfg change

https://gerrit.wikimedia.org/r/984220

Change 990126 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] thanos::rule: set reload service to stopped

https://gerrit.wikimedia.org/r/990126

Change 990126 merged by Herron:

[operations/puppet@production] thanos::rule: set reload service to stopped

https://gerrit.wikimedia.org/r/990126

Can confirm we're good in terms of puppet runs not changing state at every run

titan1001:~$ pat
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for titan1001.eqiad.wmnet
Info: Applying configuration version '(996c73dc41) Taavi Väänänen - P:toolforge::mailrelay: reject mail not using Toolforge domains'
Notice: Applied catalog in 24.79 seconds
herron claimed this task.

I think we're in good shape here, please reopen if anything else is needed