[go: nahoru, domu]

Page MenuHomePhabricator

Scott_French (Scott French)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jan 18 2024, 5:33 PM (32 w, 6 d)
Availability
Available
LDAP User
Scott French
MediaWiki User
SFrench-WMF [ Global Accounts ]

Recent Activity

Today

Scott_French created T374047: Pre-switchover cookbook testing.
Wed, Sep 4, 5:52 PM · Datacenter-Switchover, serviceops
Scott_French claimed T330273: sre.switchdc.mediawiki cookbook should take a task-id argument.
Wed, Sep 4, 5:10 PM · Patch-For-Review, serviceops, Datacenter-Switchover
Scott_French placed T374018: decommission mw2260.codfw.wmnet, mw2267.codfw.wmnet up for grabs.
Wed, Sep 4, 4:39 PM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
Scott_French created T374018: decommission mw2260.codfw.wmnet, mw2267.codfw.wmnet.
Wed, Sep 4, 3:10 PM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
Scott_French updated the task description for T373934: Update iDRAC on mw2260.codfw.wmnet.
Wed, Sep 4, 2:04 AM · SRE, ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops

Yesterday

Scott_French added a comment to P68611 Masterwork From Distant Lands.

To recap, this was the result of the sre.hosts.decommission run on 2024-09-02 for mw[2261-2262,2268-2270].codfw.wmnet (5 hosts in this rack).

Tue, Sep 3, 11:09 PM
Scott_French updated the task description for T373916: Relabel codfw kubernetes nodes.
Tue, Sep 3, 9:38 PM · SRE, ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
Scott_French created T373934: Update iDRAC on mw2260.codfw.wmnet.
Tue, Sep 3, 8:07 PM · SRE, ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
Scott_French added a comment to T352245: Migrate etcd::tlsproxy Nginx certs and etcd itself to PKI.

At a high level, we can split this into two phases: TLS proxy (nginx) and etcd.

Tue, Sep 3, 5:37 PM · serviceops

Sat, Aug 31

Scott_French added a comment to T352245: Migrate etcd::tlsproxy Nginx certs and etcd itself to PKI.

As we've reached the end of August and the v3 migration is still pending due to higher priority work, I think it's time to reassess this.

Sat, Aug 31, 12:22 AM · serviceops

Fri, Aug 30

Scott_French updated the task description for T373591: Relabel codfw kubernetes nodes.
Fri, Aug 30, 4:47 PM · SRE, ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
Scott_French claimed T328908: Migrate sre.switchdc.mediawiki to spicerack class API.
Fri, Aug 30, 1:16 AM · Patch-For-Review, Data-Persistence, serviceops, Datacenter-Switchover, SRE

Thu, Aug 29

Scott_French added a comment to T372603: Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition.

Although a bit of a process, the following will definitely work:

Thu, Aug 29, 10:58 PM · serviceops
Scott_French added a comment to T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions.

Thanks for chatting earlier today @dduvall.

Thu, Aug 29, 6:53 PM · Kubernetes, Deployments, Release-Engineering-Team (Priority Backlog 📥)
Scott_French added a comment to T371273: Verify our current wikikube capacity (in both DCs) can handle all our traffic.

Spent a bit of time thinking about this today.

Thu, Aug 29, 12:25 AM · Datacenter-Switchover, serviceops

Wed, Aug 28

Scott_French added a comment to T372602: Prepare PHP 8.1 production images.

The 8.1-based production images are ready to go and seem to work per some basic local smoke tests.

Wed, Aug 28, 4:04 PM · Patch-For-Review, serviceops

Tue, Aug 27

Scott_French created T373491: Relabel codfw kubernetes nodes.
Tue, Aug 27, 10:08 PM · ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
Scott_French added a comment to T372603: Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition.

Alright, so there is at least one tricky bit to this: How do we run generateUpperCharTable.php on 8.1 without also installing 8.1 on maintenance hosts?

Tue, Aug 27, 12:07 AM · serviceops

Mon, Aug 26

Scott_French created T373401: Relabel codfw kubernetes nodes.
Mon, Aug 26, 10:18 PM · ops-codfw, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops
Scott_French added a comment to T372507: Prepare WMF PHP 8.1 packages for Bullseye.

One last test and point-of-note:

Mon, Aug 26, 7:22 PM · MediaWiki-Platform-Team (Radar), serviceops
Scott_French updated the task description for T359423: Migrate charts to Calico Network Policies.
Mon, Aug 26, 6:18 PM · Data-Platform-SRE, Prod-Kubernetes, Kubernetes, serviceops

Fri, Aug 23

Scott_French updated subscribers of T372507: Prepare WMF PHP 8.1 packages for Bullseye.

Following up on the status of the php-geoip extension (h/t to @Krinkle for all the discussion out of band):

Fri, Aug 23, 6:48 PM · MediaWiki-Platform-Team (Radar), serviceops

Thu, Aug 22

Scott_French added a comment to T373037: Make ParserCache more like a ring.

Thank you very much for the explanation in T373037#10085662, Amir - that makes sense. On my quick read of the key-structure description, it did not occur to me that both "tiers" are in the same store.

Thu, Aug 22, 11:03 PM · Epic, DBA
Scott_French added a comment to T372507: Prepare WMF PHP 8.1 packages for Bullseye.

Shared objects are now present in all three packages, as well as a previously missing .ini file from wikidiff2. Verified that a local build of docker-registry.wikimedia.org/php8.1-fpm-multiversion-base no longer produces warnings about missing extensions.

Thu, Aug 22, 8:33 PM · MediaWiki-Platform-Team (Radar), serviceops
Scott_French added a comment to T372507: Prepare WMF PHP 8.1 packages for Bullseye.

Alright, I think I see a way out of this: I'd overlooked that the debian/rules files for these packages set an explicit INSTALL_ROOT for make install in override_dh_auto_install, which did not include the version number (i.e., does not match the "manually made coinstallable" package name). Fixing that makes it so that the result in where dh-php expects it to be.

Thu, Aug 22, 7:31 PM · MediaWiki-Platform-Team (Radar), serviceops
Scott_French added a comment to T373037: Make ParserCache more like a ring.

Interesting! @Ladsgroup - Could you expand on the first point? ("Make sure the sharding ...") The reference to a 50% flush on section removal sounds like going from a naive mod N (= number of sections) to a static number of logical shards (so, approaching consistent hashing, which aligns with your later points), but I'm not sure I understand the relationship with the cache key structure in the first sentence.

Thu, Aug 22, 3:29 PM · Epic, DBA

Wed, Aug 21

Scott_French added a comment to T372507: Prepare WMF PHP 8.1 packages for Bullseye.

While working through the production image definitions for T372602, I discovered that the three extension packages maintained by WMF (php-luasandbox, php-wmerrors, wikidiff2) build successfully, but with incomplete contents.

Wed, Aug 21, 9:59 PM · MediaWiki-Platform-Team (Radar), serviceops
Scott_French added a comment to T372507: Prepare WMF PHP 8.1 packages for Bullseye.

Verified that in a fresh docker-registry.wikimedia.org/bullseye:latest image, I can successfully:

Wed, Aug 21, 6:34 PM · MediaWiki-Platform-Team (Radar), serviceops

Tue, Aug 20

Scott_French added a subtask for T370962: Southward Datacenter Switchover (September 2024): T372849: Determine switchover changes for migration of video scaling to k8s.
Tue, Aug 20, 1:47 AM · Datacenter-Switchover, serviceops
Scott_French added a parent task for T372849: Determine switchover changes for migration of video scaling to k8s: T370962: Southward Datacenter Switchover (September 2024).
Tue, Aug 20, 1:47 AM · Datacenter-Switchover, serviceops
Scott_French created T372849: Determine switchover changes for migration of video scaling to k8s.
Tue, Aug 20, 1:45 AM · Datacenter-Switchover, serviceops

Fri, Aug 16

Scott_French added a comment to T359130: Update DC switchover cookbooks to handle maintenance scripts on k8s.

Thanks for writing this up, Reuven.

Fri, Aug 16, 11:32 PM · Datacenter-Switchover, serviceops, MW-on-K8s
Scott_French added a comment to T367118: Control mw-on-k8s periodic maintenance jobs with an etcd value.

Agreed, yeah: Some subset of those items will need done before the switchover, but exactly which subset depends on how far we expect things to be by then. I'll follow up on the task shortly.

Fri, Aug 16, 11:15 PM · Datacenter-Switchover, serviceops, MW-on-K8s
Scott_French added a comment to T367118: Control mw-on-k8s periodic maintenance jobs with an etcd value.

Reminder to self: once live, wire this into the 01-stop-maintenance.py and 08-start-maintenance.py cookbooks.

Fri, Aug 16, 5:01 PM · Datacenter-Switchover, serviceops, MW-on-K8s
Scott_French claimed T367118: Control mw-on-k8s periodic maintenance jobs with an etcd value.

+1 to option #3 as the most sensible / obvious one: adding something more complex than a single global boolean invites odd nonsense states in combination with read-only (currently a per-DC toggle) and primary DC.

Fri, Aug 16, 5:00 PM · Datacenter-Switchover, serviceops, MW-on-K8s
Scott_French added a subtask for T370962: Southward Datacenter Switchover (September 2024): T372649: Audit / update switchover-related cookbooks.
Fri, Aug 16, 2:56 PM · Datacenter-Switchover, serviceops
Scott_French added a parent task for T372649: Audit / update switchover-related cookbooks: T370962: Southward Datacenter Switchover (September 2024).
Fri, Aug 16, 2:56 PM · Patch-For-Review, Datacenter-Switchover, serviceops
Scott_French created T372649: Audit / update switchover-related cookbooks.
Fri, Aug 16, 2:54 PM · Patch-For-Review, Datacenter-Switchover, serviceops

Thu, Aug 15

Scott_French created T372605: Extend x-wikimedia-debug-routing.lua to support PHP 8.1 mw-debug deployment.
Thu, Aug 15, 9:32 PM · serviceops
Scott_French created T372604: Turn up PHP 8.1-flavored mw-debug k8s deployment.
Thu, Aug 15, 9:24 PM · serviceops
Scott_French created T372603: Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition.
Thu, Aug 15, 9:11 PM · serviceops
Scott_French created T372602: Prepare PHP 8.1 production images.
Thu, Aug 15, 9:03 PM · Patch-For-Review, serviceops

Wed, Aug 14

Scott_French added a comment to T370304: Bursts of occasional severe contention on s4 (commonswiki) primary mariadb causing recurrent user-facing outages on all wikis.

Quick update from another occurrence starting at ~ 20:45 UTC today:

Wed, Aug 14, 11:43 PM · MediaWiki-Platform-Team (Radar), Vuln-DoS, SecTeam-Processed, Security, Essential-Work, Content-Transform-Team-WIP, User-notice, Wikimedia-Incident, DBA, Wikimedia-production-error
Scott_French created T372507: Prepare WMF PHP 8.1 packages for Bullseye.
Wed, Aug 14, 7:52 PM · MediaWiki-Platform-Team (Radar), serviceops

Tue, Aug 13

Scott_French added a comment to T370304: Bursts of occasional severe contention on s4 (commonswiki) primary mariadb causing recurrent user-facing outages on all wikis.

Additional period(s) of badness later today starting around 21:00 UTC.

Tue, Aug 13, 10:05 PM · MediaWiki-Platform-Team (Radar), Vuln-DoS, SecTeam-Processed, Security, Essential-Work, Content-Transform-Team-WIP, User-notice, Wikimedia-Incident, DBA, Wikimedia-production-error

Thu, Aug 8

Scott_French added a comment to T356293: Migrate MW appservers' base images to bullseye.

Though mainly focused on supporting the php 8.1 migration, there's ongoing work to support multiple base-image “flavors” and a helm-release-to-flavor mapping in scap (T370934), which may be useful here.

Thu, Aug 8, 11:41 PM · MW-on-K8s, Patch-For-Review, serviceops, SRE

Wed, Aug 7

Scott_French added a comment to T371885: Gaps in Grafana graphs using Thanos.

For the endpoints marked down: it looks as if prometheus is scraping both container ports - i.e., 9102 (correct) and 9125 (statsd listen port, incorrect).

Wed, Aug 7, 7:18 PM · SRE Observability (FY2024/2025-Q1), serviceops, MW-on-K8s, Grafana, Observability-Metrics

Tue, Aug 6

Scott_French added a comment to T368096: mediawiki: migrate from image-suggestion to data-gateway.

Ah, these are good questions.

Tue, Aug 6, 6:52 PM · Cassandra, serviceops
Scott_French added a comment to T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions.

Agreed with @Joe's assessment above: for each image type (e.g., mediawiki), scap would need to support a configurable set of base images from which the image will be built ("flavors").

Tue, Aug 6, 4:53 PM · Kubernetes, Deployments, Release-Engineering-Team (Priority Backlog 📥)

Aug 1 2024

Scott_French closed T369921: Support warmup for local caches in mw-on-k8s, a subtask of T370962: Southward Datacenter Switchover (September 2024), as Resolved.
Aug 1 2024, 11:55 PM · Datacenter-Switchover, serviceops
Scott_French closed T369921: Support warmup for local caches in mw-on-k8s as Resolved.
Aug 1 2024, 11:55 PM · Patch-For-Review, serviceops
Scott_French added a comment to T369921: Support warmup for local caches in mw-on-k8s.

With the cache_warmup class relocated, I think the near-term work is done. There are two TODOs related to fully removing the script etc. from the maintenance hosts, but IMO we can just wait for the latter to go away as planned.

Aug 1 2024, 11:55 PM · Patch-For-Review, serviceops
Scott_French added a comment to T369921: Support warmup for local caches in mw-on-k8s.

Alright, this should now be done:

  • the script's clone subcommand enumerates pod IP:port pairs via the -tls-service Endpoints object(s)
  • the script and URL files are now installed on the deployment hosts, where the necessary k8s configs and credentials are present
  • the switchdc warmup caches cookbook now invokes the script on the primary deployment host via cumin (with updated arguments)
Aug 1 2024, 5:37 PM · Patch-For-Review, serviceops

Jul 30 2024

Scott_French added a subtask for T370962: Southward Datacenter Switchover (September 2024): T369921: Support warmup for local caches in mw-on-k8s.
Jul 30 2024, 4:26 PM · Datacenter-Switchover, serviceops
Scott_French added a parent task for T369921: Support warmup for local caches in mw-on-k8s: T370962: Southward Datacenter Switchover (September 2024).
Jul 30 2024, 4:26 PM · Patch-For-Review, serviceops

Jul 26 2024

Scott_French created T371130: MoveComms support for Southward Datacenter Switchover (September 2024).
Jul 26 2024, 5:40 PM · MoveComms-Support, Datacenter-Switchover, serviceops

Jul 25 2024

Scott_French created T371045: Relabel eqiad kubernetes nodes.
Jul 25 2024, 5:11 PM · SRE, ops-eqiad, Kubernetes, Prod-Kubernetes, DC-Ops, serviceops

Jul 24 2024

Scott_French added a project to T370962: Southward Datacenter Switchover (September 2024): Datacenter-Switchover.
Jul 24 2024, 10:21 PM · Datacenter-Switchover, serviceops
Scott_French updated the task description for T370962: Southward Datacenter Switchover (September 2024).
Jul 24 2024, 9:30 PM · Datacenter-Switchover, serviceops
Scott_French updated the task description for T370962: Southward Datacenter Switchover (September 2024).
Jul 24 2024, 9:14 PM · Datacenter-Switchover, serviceops
Scott_French created T370962: Southward Datacenter Switchover (September 2024).
Jul 24 2024, 9:07 PM · Datacenter-Switchover, serviceops

Jul 23 2024

Scott_French updated the task description for T367949: Spin down api_appserver and appserver clusters.
Jul 23 2024, 8:56 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
Scott_French added a comment to T367949: Spin down api_appserver and appserver clusters.

Many thanks, all who helped get this out the door.

Jul 23 2024, 8:56 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
Scott_French added a comment to T367949: Spin down api_appserver and appserver clusters.

Silenced ProbeDown for api-https:443 and appservers-https:443 for 24h:

  • f6f67d8d-6381-43b3-9262-9a8cf58f2b19
  • ed0d352b-fb83-4bd4-a586-142b100ca6e5
Jul 23 2024, 4:18 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Jul 22 2024

Scott_French added a comment to T367547: Cloud VPS "puppet-diffs" project Buster deprecation.

Following up here after various chats on IRC:

Jul 22 2024, 9:49 PM · Infrastructure-Foundations, Puppet CI, Cloud-VPS (Debian Buster Deprecation)

Jul 19 2024

Scott_French created P66845 Ic48417e5acb0a64cd6af1c66a2b25853a8c2a5ef dry-run example.
Jul 19 2024, 9:36 PM

Jul 18 2024

Scott_French added a comment to T370425: Misbehaving mw-api-ext pods serving 5xx.

In both cases, workers start failing with SIGILL at the start of badness, e.g. (from mw-api-ext.eqiad.main-7686884f77-ql69d):

Jul 18 2024, 7:51 PM · Wikimedia-production-error, serviceops
Scott_French added a comment to T367949: Spin down api_appserver and appserver clusters.

appservers-ro.discovery.wmnet and api-ro.discovery.wmnet now resolve to failoid, by way of manually updating their DYNA records in the wmnet zone template to point to geoip!disc-failoid:

Jul 18 2024, 6:24 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Jul 16 2024

Scott_French updated subscribers of T367949: Spin down api_appserver and appserver clusters.

Current status:

  • appservers-rw and api-rw are depooled everywhere, and resolve to failoid as of 17:45 UTC
  • api-ro is serving only from eqiad as of 17:40 UTC
  • appservers-ro is depooled everywhere as of 19:25 UTC
Jul 16 2024, 8:14 PM · Patch-For-Review, Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Jul 12 2024

Scott_French triaged T369932: sextant: support module garbage collection as Low priority.
Jul 12 2024, 6:53 PM · serviceops
Scott_French created T369932: sextant: support module garbage collection.
Jul 12 2024, 6:53 PM · serviceops
Scott_French triaged T369921: Support warmup for local caches in mw-on-k8s as Low priority.
Jul 12 2024, 5:13 PM · Patch-For-Review, serviceops
Scott_French created T369921: Support warmup for local caches in mw-on-k8s.
Jul 12 2024, 4:36 PM · Patch-For-Review, serviceops

Jul 11 2024

Scott_French added a comment to T369745: Fix errors in Commons Analytics OpenAPI spec.

For the record: v1.0.3 is live in staging only (production is untouched), after it became apparent that additional changes are needed. If a 1.0.4 is available with an updated swagger spec, let me know and I'm happy to assist.

Jul 11 2024, 7:29 PM · Data Products (Data Products Sprint 16), Commons-Impact-Metrics, Documentation, AQS2.0
Scott_French closed T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production as Resolved.

Ah, great - thanks for confirming those older docs will go away, @mforns.

Jul 11 2024, 5:04 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

+1 to using a more descriptive name for the resource operated on.

Jul 11 2024, 3:44 PM · SRE, Traffic

Jul 10 2024

Scott_French added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Cool, it sounds like the conversation has evolved to using a dedicated schema, and we're on the same page that a multi-value set should work (to accommodate reason).

Jul 10 2024, 9:30 PM · SRE, Traffic
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Alright, good(er) news: the service is now live at /api/rest_v1/metrics/commons-analytics.

Jul 10 2024, 4:24 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs.

Great, thank you very much @dcausse for cleaning up the old config and @Clement_Goubert for confirming.

Jul 10 2024, 3:53 PM · Discovery-Search (Current work), Wikidata

Jul 9 2024

Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Ah, thanks for surfacing that, @mforns.

Jul 9 2024, 11:48 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

In short, and I realize this doesn't help much, my understanding is that what makes sense as an object name vs. an object tag is really up to you (e.g., ergonomics of tag selectors for common operations).

Jul 9 2024, 9:49 PM · SRE, Traffic
Scott_French updated subscribers of T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs.

Found my way here via the highly informative "wdqs-streaming-updater-test-T361935/WMF" user-agent you've used. Thanks for that!

Jul 9 2024, 8:01 PM · Discovery-Search (Current work), Wikidata
Scott_French added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Ah, interesting - I wasn't aware of the prior art with dnsbox. Indeed, reusing node for a fundamentally "host shaped thing" where (1) you anticipate eventually using as-yet unused fields and (2) do not anticipate ever needing to enrich node with new fields, seems less concerning.

Jul 9 2024, 6:27 PM · SRE, Traffic
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

@SGupta-WMF - thanks for documenting the API at [0]. One thing I noticed while updating wikitech: it looks like the examples assume the service is reachable at /api/rest_v1/metrics/commons-impact-analytics rather than /api/rest_v1/metrics/commons-impact.

Jul 9 2024, 5:33 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French updated the task description for T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.
Jul 9 2024, 5:28 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Alright, good news: /api/rest_v1/metrics/commons-impact should now be publicly available.

Jul 9 2024, 5:22 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French updated the task description for T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.
Jul 9 2024, 5:14 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French updated the task description for T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.
Jul 9 2024, 4:30 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T369366: Migrate DNS depooling of sites from operations/dns (git) to confctl.

Thanks for the excellent / detailed write-up, @ssingh!

Jul 9 2024, 3:58 PM · SRE, Traffic
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

@mforns - The v1.0.2 image is now live in staging. Please take a look when you get a chance, and let me know if / when you'd like me to proceed with the remaining steps.

Jul 9 2024, 2:57 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE

Jul 8 2024

Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

@mforns - The v1.0.1 image is now live in staging. As before, it can be reached internally at https://commons-impact-analytics.k8s-staging.discovery.wmnet:30443/ from any production host.

Jul 8 2024, 3:54 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE

Jul 2 2024

Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

@mforns sure, that's no problem at all! Just let me know when the image is ready.

Jul 2 2024, 7:59 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Thanks for taking a look, @xcollazo. I'll defer to @mforns and @SGupta-WMF here, as my quick check was only based on comparison with [0] (which uses timestamp in the public API).

Jul 2 2024, 5:01 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Thanks for the sample data, @xcollazo.

Jul 2 2024, 3:52 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE

Jul 1 2024

Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

The service is up and running in staging, and can be reached at https://commons-impact-analytics.k8s-staging.discovery.wmnet:30443 internally.

Jul 1 2024, 5:57 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French updated the task description for T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.
Jul 1 2024, 5:27 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE

Jun 28 2024

Scott_French added a comment to T350656: dbconfig bug - "2 instances found for query ...".

I came across this today while looking for prior art on a semi-related theme (invariants to assert on etcd key structure).

Jun 28 2024, 7:43 PM · Data-Persistence, conftool
Scott_French updated the task description for T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.
Jun 28 2024, 4:12 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE
Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Thanks so much, @SGupta-WMF.

Jun 28 2024, 4:11 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE

Jun 27 2024

Scott_French added a comment to T361835: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production.

Thanks for giving that a try, @mforns !

Jun 27 2024, 9:05 PM · Data Products (Data Products Sprint 16), Patch-For-Review, serviceops, Service-deployment-requests, SRE