[go: nahoru, domu]

Page MenuHomePhabricator

elukey (Luca Toscano)
Site Reliability Engineer - Machine Learning

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (451 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Today

elukey added a comment to T373810: sre.hosts.reimage fails when the node is already in puppet db but has no facts (puppet never ran).

We had a chat with the team about this:

Mon, Sep 2, 3:13 PM · Infrastructure-Foundations, SRE-tools
elukey closed T260664: Create a cookbook for applying an apache config change safely as Declined.

We are reviewing our tasks and this seems to be not relevant anymore in the context of MW on K8s, so we decided to close, but please re-open if you feel something is needed!

Mon, Sep 2, 3:01 PM · Infrastructure-Foundations, SRE-tools, serviceops, SRE
elukey added a comment to T373783: Integrate Bookworm 12.7 point update.

I updated the bullseye and bookworm netist images this morning but didn't know about upgrading the packages for the point release :(

Mon, Sep 2, 2:40 PM · Infrastructure-Foundations, SRE
elukey added a watcher for Infrastructure-Foundations: elukey.
Mon, Sep 2, 2:39 PM
elukey triaged T373432: Some of the packages present in the Docker registry are not visible in Debmonitor as Medium priority.

I think I know why this is happening, didn't realize till now. Docker Reporter, the tool that scans the registry for images and calls debmonitor for them, has the following configs:

Mon, Sep 2, 1:47 PM · User-Elukey, Infrastructure-Foundations
elukey removed a project from T373534: Migrate the ownership of DPE-Owned Docker images in production-images repo to mailing lists: Infrastructure-Foundations.
Mon, Sep 2, 1:39 PM · Data-Platform-SRE, Machine-Learning-Team, serviceops, Security
elukey added a comment to T371890: pynetbox incompatibility with Netbox >= 4.0.6.

Follow up: https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1070028 - Spicerack's setup.py needed to be updated to avoid build failures.

Mon, Sep 2, 1:37 PM · Patch-For-Review, Infrastructure-Foundations, netbox
elukey triaged T373794: Spicerack errors out when building without connectivity as High priority.
Mon, Sep 2, 1:30 PM · SRE-tools, Spicerack, Infrastructure-Foundations
elukey added a comment to T373794: Spicerack errors out when building without connectivity.

I tried to build 8.10.0 to make sure that it wasn't related to the 8.11.0 changes, and I get the same result.

Mon, Sep 2, 1:09 PM · SRE-tools, Spicerack, Infrastructure-Foundations
elukey updated subscribers of T373794: Spicerack errors out when building without connectivity.
Mon, Sep 2, 10:53 AM · SRE-tools, Spicerack, Infrastructure-Foundations
elukey created T373794: Spicerack errors out when building without connectivity.
Mon, Sep 2, 10:52 AM · SRE-tools, Spicerack, Infrastructure-Foundations
elukey added a comment to T370203: Install Matomo Custom Reports Plugin for wikimediafoundation.org.

I appreciate everyone's feedback on the subject, and I want make clear that my team (Data Platform SRE) is firmly committed to open source principles, and that my colleagues spent a considerable amount of effort attempting to find an open-source alternative in accordance with the guiding principles.

Mon, Sep 2, 10:42 AM · Software-Licensing, Data-Platform-SRE (2024.08.17 - 2024.09.06)
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

Next steps:

Mon, Sep 2, 9:18 AM · User-Elukey, Infrastructure-Foundations, SRE

Fri, Aug 30

elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Next steps:

Fri, Aug 30, 3:26 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey closed T367427: Cleanup old Docker images running Debian Stretch/Jessie/Buster as Resolved.

A lot of cleanup has been done, so far it seems that the task can be closed. We have a better way to figure out what images are not supported by the docker reporter now, so it is also good.

Fri, Aug 30, 1:38 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, serviceops
elukey closed T367427: Cleanup old Docker images running Debian Stretch/Jessie/Buster, a subtask of T368366: Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm, as Resolved.
Fri, Aug 30, 1:36 PM · serviceops, Security, Infrastructure-Foundations
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

After checking the JVM's committed vs used memory it seems to me that we allocate more than what we need on average, for a grand total of ~52GB (I think 48Gb of pre-allocated heap plus some extra GBs for the Metaspace).

Fri, Aug 30, 1:08 PM · User-Elukey, Infrastructure-Foundations, SRE

Thu, Aug 29

elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

Created https://grafana-rw.wikimedia.org/d/e0f6afe3-2aea-483d-9f5e-55f0cba9207f/puppetserver, didn't add all the metrics but it should be a good start to figure out what's happening.

Thu, Aug 29, 3:26 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

Something very strange:

Thu, Aug 29, 2:24 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey moved T369491: Migrate aux cluster off of Pod Security Policies from Backlog to In Progress on the User-Elukey board.
Thu, Aug 29, 1:32 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes
elukey moved T373526: Migrate the ownership of Docker images in production-images repo to mailing lists from Backlog to Waiting for others on the User-Elukey board.
Thu, Aug 29, 1:32 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations
elukey moved T373527: puppetserver1002 thrashing and requiring a power cycle as a result from Backlog to In Progress on the User-Elukey board.
Thu, Aug 29, 1:32 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey moved T368744: Allow debmonitor to store the Debian version-id in the OS field from In Progress to Stalled on the User-Elukey board.
Thu, Aug 29, 1:31 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey triaged T373527: puppetserver1002 thrashing and requiring a power cycle as a result as High priority.
Thu, Aug 29, 1:19 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a project to T373527: puppetserver1002 thrashing and requiring a power cycle as a result: Infrastructure-Foundations.

Thanks a lot for the task!

Thu, Aug 29, 1:05 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a project to T373527: puppetserver1002 thrashing and requiring a power cycle as a result: User-Elukey.
Thu, Aug 29, 1:05 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a comment to T373526: Migrate the ownership of Docker images in production-images repo to mailing lists.

I see one problem with this approach. Teams change. Their names, their compositions, their email addresses and so on, especially during re-orgs and in the past we 've done a lot of those in the WMF. If this is to happen, really stable team names need to be chosen.

Thu, Aug 29, 12:52 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations

Wed, Aug 28

elukey edited projects for T373526: Migrate the ownership of Docker images in production-images repo to mailing lists, added: User-Elukey; removed Security.
Wed, Aug 28, 3:20 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations
elukey added a comment to T345070: Attach opencontainers image metadata to docker images.

Created T371549 to move the Maintainer field of the production-images repo's control files to team-specific (where possible). Ideally this should be the info that docker-pkg will use to publish the label about who to contact.

Wed, Aug 28, 2:25 PM · User-MoritzMuehlenhoff, User-Elukey, Release-Engineering-Team, serviceops, docker-pkg
Joe awarded T373526: Migrate the ownership of Docker images in production-images repo to mailing lists a Like token.
Wed, Aug 28, 2:24 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations
elukey added projects to T373526: Migrate the ownership of Docker images in production-images repo to mailing lists: Machine-Learning-Team, Data-Platform-SRE.
Wed, Aug 28, 2:17 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations
elukey created T373526: Migrate the ownership of Docker images in production-images repo to mailing lists.
Wed, Aug 28, 2:11 PM · Patch-For-Review, User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations
elukey added a comment to T216826: Move Kartotherian to Kubernetes.

Hi @MSantos! Ack thanks for the info, will keep it in mind when hopefully upgrading the maps nodes :) Any news about the Kartotherian depencency upgrade?

Wed, Aug 28, 1:36 PM · Content-Transform-Team, Patch-For-Review, WMDE-TechWish-Sprint-2022-11-29, serviceops, WMDE-TechWish-Sprint-2022-11-09, Platform Engineering, WMDE-TechWish-Sprint-2022-10-26, WMDE-TechWish-Maintenance, WMDE-GeoInfo-FocusArea, Epic, Maps (Kartotherian)
elukey created T373519: Allow UEFI DHCP configs.
Wed, Aug 28, 1:05 PM · Infrastructure-Foundations

Tue, Aug 27

elukey moved T372485: Spicerack's tox config times out all the time after T342019 from In Progress to Waiting for others on the User-Elukey board.
Tue, Aug 27, 3:04 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey closed T371132: Provision cookbook not setting serial console and other settings as Resolved.
Tue, Aug 27, 3:03 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Tue, Aug 27, 3:02 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey added a comment to T372485: Spicerack's tox config times out all the time after T342019.

@hashar thanks a lot for the investigation! Would it be possible to have an option in the jjb config to allow Python jobs to specify the list of tox environments to execute? If so we could set the py39 ones for spicerack as special use case, and update them when the cumin nodes are upgraded.

Tue, Aug 27, 2:18 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey added a comment to T373417: db2230, db2231 and db2232 reimage failure.

@Marostegui hey Papaul's on vacation this week. From what I remember that is a 10G issue. We started using this tag in the reimage script to keep this one from coming up.
--force-dhcp-tftp

Tue, Aug 27, 2:12 PM · ops-codfw, DBA, DC-Ops
elukey added a comment to T368744: Allow debmonitor to store the Debian version-id in the OS field.

Today I cleaned up some db nodes reported as debmonitor client failures while I was on holiday:

Tue, Aug 27, 1:30 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey added a comment to T370203: Install Matomo Custom Reports Plugin for wikimediafoundation.org.

@mark @odimitrijevic thanks a lot for the explanations and the rationale, I am not 100% happy with the outcome but I will not oppose anymore :) In the future we'd love to get involved sooner (we == SRE team) rather than later in these kind of discussions, to help as much as possible and provide more options (if available).

Tue, Aug 27, 1:10 PM · Software-Licensing, Data-Platform-SRE (2024.08.17 - 2024.09.06)
elukey added a comment to T372825: Unexpected helmfile changes when attempting a k8s deployment for a miscweb site.

@sbassett Hi! Please subscribe to the ops mailing list so you can get notified by these changes, usually we post a message there to warn users. In this case it is safe to deploy since it is just a rebuild to use a new OS (Bookworm) and skip Buster.

Tue, Aug 27, 12:47 PM · serviceops

Mon, Aug 26

elukey renamed T373369: Service puppetmaster1001:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4) from Service puppetmaster1003:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4) to Service puppetmaster1001:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4).
Mon, Aug 26, 4:56 PM · observability, Infrastructure-Foundations
elukey created T373369: Service puppetmaster1001:8141 has failed probes (http_puppetmaster1003_eqiad_wmnet_backend_https_ip4).
Mon, Aug 26, 4:56 PM · observability, Infrastructure-Foundations
elukey added a comment to T372472: docker-registry.wikimedia.org/dcl-puppet-pki fails to install debmonitor-client.

I'm a little surprised we haven't seen this on any bullseye hosts in production, but perhaps we haven't provisioned a new one host or added a user since we exceeded the length limit?

Mon, Aug 26, 3:37 PM · Patch-For-Review, Infrastructure-Foundations
elukey moved T184435: Puppet tox: properly lint both Py2 and Py3 files from Backlog to In Progress on the User-Elukey board.
Mon, Aug 26, 3:01 PM · User-Elukey, Infrastructure-Foundations, Python3-Porting, SRE-tools, SRE
elukey added a project to T184435: Puppet tox: properly lint both Py2 and Py3 files: User-Elukey.
Mon, Aug 26, 2:58 PM · User-Elukey, Infrastructure-Foundations, Python3-Porting, SRE-tools, SRE
elukey lowered the priority of T368023: Move the private Puppet repository to puppetserver1001 from High to Medium.

Left to do:

Mon, Aug 26, 2:53 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey triaged T372485: Spicerack's tox config times out all the time after T342019 as High priority.
Mon, Aug 26, 2:42 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations

Fri, Aug 16

elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Currently blocked by T372485

Fri, Aug 16, 4:30 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey updated subscribers of T370203: Install Matomo Custom Reports Plugin for wikimediafoundation.org.

Hi! I am Luca from the SRE Infrastructure Foundations team, I had a chat with Ben today about https://gerrit.wikimedia.org/r/c/operations/puppet/+/1062401 and the need to deploy this extra plugin.

Fri, Aug 16, 2:03 PM · Software-Licensing, Data-Platform-SRE (2024.08.17 - 2024.09.06)

Wed, Aug 14

elukey added a project to T372485: Spicerack's tox config times out all the time after T342019: User-Elukey.
Wed, Aug 14, 4:27 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey moved T372485: Spicerack's tox config times out all the time after T342019 from Backlog to In Progress on the User-Elukey board.
Wed, Aug 14, 4:27 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey updated the task description for T372485: Spicerack's tox config times out all the time after T342019.
Wed, Aug 14, 4:24 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey added a project to T372485: Spicerack's tox config times out all the time after T342019: Release-Engineering-Team.
Wed, Aug 14, 4:23 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey created T372485: Spicerack's tox config times out all the time after T342019.
Wed, Aug 14, 4:20 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey reopened T368744: Allow debmonitor to store the Debian version-id in the OS field, a subtask of T367427: Cleanup old Docker images running Debian Stretch/Jessie/Buster, as Open.
Wed, Aug 14, 2:38 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, serviceops
elukey reopened T368744: Allow debmonitor to store the Debian version-id in the OS field as "Open".

Found another issue:

Wed, Aug 14, 2:37 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Filed https://gitlab.wikimedia.org/repos/sre/conftool/-/merge_requests/24 to fix a conftool issue:

Wed, Aug 14, 2:13 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey created T372472: docker-registry.wikimedia.org/dcl-puppet-pki fails to install debmonitor-client.
Wed, Aug 14, 2:04 PM · Patch-For-Review, Infrastructure-Foundations
elukey added a comment to T300102: Upgrade Kafka to from 1.x to later version.

@brouberol after T355550 do we have any plans to start testing the upgrade on kafka-test or similar? I can help if needed :)

Wed, Aug 14, 12:17 PM · Data-Platform-SRE, Event-Platform, Epic, Data-Engineering, SRE, observability, serviceops

Tue, Aug 13

elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Encountered an issue with the BMC's network config:

Tue, Aug 13, 3:44 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

@Jhancock.wm Hi! Feel free to leave this host to me, still trying to figure out some stuff :(

Tue, Aug 13, 3:26 PM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey closed T368744: Allow debmonitor to store the Debian version-id in the OS field, a subtask of T367427: Cleanup old Docker images running Debian Stretch/Jessie/Buster, as Resolved.
Tue, Aug 13, 12:58 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, serviceops
elukey closed T368744: Allow debmonitor to store the Debian version-id in the OS field as Resolved.
Tue, Aug 13, 12:58 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey moved T368023: Move the private Puppet repository to puppetserver1001 from In Progress to Waiting for others on the User-Elukey board.
Tue, Aug 13, 12:57 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey moved T371899: Review how the debmonitor server processes hosts/images when starting fresh from Backlog to Waiting for others on the User-Elukey board.
Tue, Aug 13, 12:57 PM · User-Elukey, Infrastructure-Foundations
elukey moved T363576: Broadcom NICs with recent firmware fail to reimage from Backlog to Waiting for others on the User-Elukey board.
Tue, Aug 13, 12:57 PM · User-Elukey, DC-Ops, ops-codfw, Infrastructure-Foundations, SRE

Mon, Aug 12

elukey created T372289: ms-be1078 has no connectivity.
Mon, Aug 12, 3:07 PM · SRE-swift-storage, SRE, DC-Ops, ops-eqiad
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Move done, tested the new "disarmed" pre-commit hook on puppetmaster1001 and a commit on puppetserver1001.

Mon, Aug 12, 1:49 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations

Thu, Aug 8

elukey closed T368359: Upgrade Knative control plane Docker images to Bullseye/Bookworm, a subtask of T368366: Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm, as Resolved.
Thu, Aug 8, 3:04 PM · serviceops, Security, Infrastructure-Foundations
elukey closed T368359: Upgrade Knative control plane Docker images to Bullseye/Bookworm as Resolved.
Thu, Aug 8, 3:04 PM · Machine-Learning-Team

Wed, Aug 7

elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Sent an email to all SREs, the move will happen on Aug 12th 13:00 UTC.

Wed, Aug 7, 2:31 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey closed T360356: Request access to servers Dcops group as Resolved.
Wed, Aug 7, 1:26 PM · User-Elukey, SRE, Infrastructure-Foundations
elukey moved T363576: Broadcom NICs with recent firmware fail to reimage from Waiting for others to Backlog on the User-Elukey board.
Wed, Aug 7, 1:25 PM · User-Elukey, DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
elukey moved T368744: Allow debmonitor to store the Debian version-id in the OS field from Waiting for others to In Progress on the User-Elukey board.
Wed, Aug 7, 1:24 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey added a comment to T371899: Review how the debmonitor server processes hosts/images when starting fresh.

Adding some ideas: https://docs.djangoproject.com/en/5.0/ref/models/querysets/#bulk-update

Wed, Aug 7, 1:23 PM · User-Elukey, Infrastructure-Foundations
elukey added a comment to T368744: Allow debmonitor to store the Debian version-id in the OS field.

Buster and Bookworm rollouts done, no big issues registered. The only drawback is that due to the high volume of writes to the db (since we are changing the Debian version etc..) the UI gets not responsive for a bit, and we get some alarms. This is due to T371899.

Wed, Aug 7, 1:20 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Wed, Aug 7, 10:15 AM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Wed, Aug 7, 9:54 AM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey added a project to T341843: Netbox rq.timeouts.JobTimeoutException: User-Elukey.
Wed, Aug 7, 9:23 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox
elukey added a comment to T368744: Allow debmonitor to store the Debian version-id in the OS field.

Rolled out the change to the hadoop cluster, this is the only error that I got:

Wed, Aug 7, 8:41 AM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations

Tue, Aug 6

elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Tue, Aug 6, 3:27 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey added a project to T371899: Review how the debmonitor server processes hosts/images when starting fresh: User-Elukey.
Tue, Aug 6, 3:07 PM · User-Elukey, Infrastructure-Foundations
elukey added a comment to T368744: Allow debmonitor to store the Debian version-id in the OS field.

The issue is described in T371899. I proceeded anyway to upgrade both debmonitor server hosts, all good so far.

Tue, Aug 6, 3:06 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey created T371899: Review how the debmonitor server processes hosts/images when starting fresh.
Tue, Aug 6, 3:05 PM · User-Elukey, Infrastructure-Foundations
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Fix for the dump-cloud-ip-ranges timer/unit rolled out, I also tried to do a manual puppet private commit and it worked. Let's wait for the next run of the timer to confirm that everything is working.

Tue, Aug 6, 7:58 AM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations

Mon, Aug 5

elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.
  • The Netbox custom script for network provisioning is now asking for a mac address (for the mgmt interface), mandatory for each supermicro.
  • spicerack's redfish module is now able to create admin users in the BMC (only for supermicro).
Mon, Aug 5, 3:45 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Mon, Aug 5, 2:50 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

After a brainbounce with Joe on the SRE IRC channel, we noticed that the environment variables when running the post commit hook (in my local repro) contained GIT_INDEX_FILE pointing to the repo representing /srv/private, instead of /var/lib/git/etc... Simply unsetting the variable in the post-commit hook make the issue disappear.

Mon, Aug 5, 11:05 AM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

I managed to repro the issue on a local docker container, and I can say that it is definitely the code of external_clouds_vendors that causes this. The repro is using only the post-commit hook to propagate the change between the two fake private repos.

Mon, Aug 5, 10:03 AM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations

Aug 2 2024

elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Aug 2 2024, 3:10 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations

Aug 1 2024

elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

It re-appened again, again only requestctl/ipblocks related staged content. Being staged meant that git pull worked, so it wouldn't have broken any regular commit happening under /srv/private (at least IIUC).

Aug 1 2024, 4:23 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Aug 1 2024, 3:47 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey added a comment to T368744: Allow debmonitor to store the Debian version-id in the OS field.

Tried to test the new debmonitor-server on debmonitor2003:

  • changed sretest1001 /etc/hosts to point debmonitor.discovery.wmnet to debmonitor2003's ip
  • dropped via spicerack the data on debmonitor for the host
  • ran the debmonitor client on the host
Aug 1 2024, 3:21 PM · Patch-For-Review, SRE-tools, User-Elukey, Infrastructure-Foundations
elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Aug 1 2024, 10:04 AM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations

Jul 31 2024

elukey updated the task description for T371132: Provision cookbook not setting serial console and other settings.
Jul 31 2024, 4:12 PM · Data-Persistence, User-Elukey, DC-Ops, Infrastructure-Foundations
elukey removed a watcher for Machine-Learning-Team: elukey.
Jul 31 2024, 3:29 PM
elukey added a comment to T368366: Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm.

From docker report (k8s images) set to work only with Bullseye+ images:

Jul 31 2024, 11:52 AM · serviceops, Security, Infrastructure-Foundations
elukey renamed T367427: Cleanup old Docker images running Debian Stretch/Jessie/Buster from Cleanup old Docker images running Debian Stretch/Jessie to Cleanup old Docker images running Debian Stretch/Jessie/Buster.
Jul 31 2024, 11:52 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, serviceops