[go: nahoru, domu]

Page MenuHomePhabricator

Database replication problems - production and labs (tracking)
Closed, ResolvedPublic

Description

This is a tracking task to monitor replication problems in the WMF infrastructure, such as:

  • Replication broken or stopped to any server
  • Data or schema differences between a master and some or all of its slaves
  • Constant or intermittent replication lag degrading the service

This tasks are normally handled by DBA team (part of SRE), requiring many times assistance from Performance-Team, Analytics, Cloud-Services, and the many Product teams.

NOTE: If the problem you are experiencing is about Wiki Replica databases in Cloud-Services (*.{analytics,web}.db.svc.eqiad.wmflabs, *.labsdb), use the Data-Services tag instead; Wiki Replica hosts have their own set of issues including sanitization and multiple user account handling, so even if it is a replica service, the issue may not be replication itself.

Details

Reference
bz48930

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
Resolved jcrespo
Resolved Springle
Declinedcoren
Resolvedcoren
Declined jcrespo
DuplicateNone
Resolvedcoren
ResolvedRyanLane
Resolved chasemp
Resolvedcoren
Resolvedcoren
ResolvedNone
Resolvedcoren
Resolvedcoren
Invalidcoren
Resolvedcoren
Resolved Springle
Declinedcoren
Resolvedcoren
DeclinedNone
StalledNone
Resolved chasemp
Resolvedcoren
DuplicateNone
Declinedcoren
ResolvedNone
ResolvedNone
Resolvedcoren
Resolved Springle
Resolvedcoren
Invalid Springle
Resolved Springle
Resolved Springle
Declinedcoren
Resolvedcoren
Resolved Springle
Resolved Springle
Resolvedcoren
Declinedcoren
Resolvedcoren
ResolvedNone
Resolvedcoren
Resolved jcrespo
Resolved jcrespo
Declined jcrespo
DuplicateNone
Resolved jcrespo
DeclinedNone
Resolved Marostegui
DeclinedNone
DuplicateNone
Resolved jcrespo
Resolved jcrespo
DeclinedNone
Resolved srodlund
DeclinedNone
Resolved jcrespo
Resolved chasemp
Declined jcrespo
InvalidNone
Resolved jcrespo
Resolved jcrespo
Resolved jcrespo
Resolved jcrespo
Resolved chasemp

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Danny_B renamed this task from (Tracking) Database replication services to Database replication services (tracking).May 27 2016, 6:01 PM
Danny_B removed a subscriber: wikibugs-l-list.
jcrespo renamed this task from Database replication services (tracking) to Database replication services - production and labs (tracking).Nov 15 2016, 4:38 PM
jcrespo updated the task description. (Show Details)
jcrespo renamed this task from Database replication services - production and labs (tracking) to Database replication problems - production and labs (tracking).Nov 15 2016, 4:56 PM
jcrespo claimed this task.

Resolving this meta-ticket. With the introduction of ROW-based replication before filterin, no recurring issue happened. The few issues are no longer related to replication problems, but pending operational issues. Fixing as, with the current architecture, it is unlikely to have recurring data drift issues again, and even if those happened, a full data reload is now possible, making it absolutely solvable.

sguebo_WMF closed subtask Restricted Task as Resolved.Jun 28 2024, 2:38 PM

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!