[go: nahoru, domu]

Page MenuHomePhabricator

[Epic] Splitting the graph in WDQS
Open, HighPublic

Description

In order to stabilize the Wikidata Query Service we are looking into splitting the graph inside Blazegraph into 2 (or potentially more) subgraphs. This ticket is for tracking the investigation into what a sensible split would be, what the consequences are and then making it happen.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedGehel
Resolved Manuel
ResolvedAndrewTavis_WMDE
DeclinedNone
DeclinedNone
Resolveddr0ptp4kt
Resolveddcausse
ResolvedLydia_Pintscher
ResolvedGehel
Resolveddcausse
Resolvedbking
Resolvedbking
Resolvedbking
ResolvedRKemper
ResolvedRKemper
Resolveddr0ptp4kt
ResolvedRKemper
ResolvedDzahn
ResolvedRKemper
ResolvedRKemper
OpenNone
ResolvedRKemper
ResolvedGehel
Resolvedbking
OpenNone
ResolvedAndrewTavis_WMDE
DuplicateAndrewTavis_WMDE
OpenNone
ResolvedGehel
Resolveddcausse
Resolveddr0ptp4kt
Resolveddcausse
InvalidSannita
Resolved Lucas_Werkmeister_WMDE
Resolveddcausse
Resolveddcausse
Resolvedpfischer
ResolvedEBernhardson
Resolveddcausse
OpenNone
OpenRKemper
ResolvedStevemunene
ResolvedStevemunene
ResolvedStevemunene
ResolvedStevemunene
OpenRKemper
OpenRKemper
InvalidGehel
OpenRKemper
OpenNone
ResolvedStevemunene
OpenNone
OpenNone
OpenNone
Openbking
OpenNone
OpenRKemper
ResolvedGehel
OpenAudreyPenven_WMDE
Resolvedpfischer

Event Timeline

Gehel triaged this task as High priority.May 22 2023, 12:58 PM
Gehel moved this task from Incoming to Scaling on the Wikidata-Query-Service board.
Gehel moved this task from Scaling to Epics on the Wikidata-Query-Service board.

When T345475 is done, we should have 3 new WDQS hosts in CODFW that could be used for the graph splitting experiment. @RKemper let us know if you have any objections to this plan.

We'll use eqiad hosts instead, see T347505

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:10Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:14Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:18Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:38Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

I don't believe this is using the Query Service. This means it would not be affected.

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:52:57Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:53:13Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013