Provide resource for db access in grid
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Merl
	Jul 30 2014, 10:16 PM

Description

Currently long maintences are done at meriadb10. Also future updates and so on. Having a sql resource would prevent sge scripts to run while database is down/broken and schedules them later when database is available again.

On toolserver there were many sge different resources defined. On Labs there will probably be only three db servers, so having one resource for replicated db, one for tools-db and one for postgre should be enough.

This resource should set to 0 while maintenance is done or replication is broken. This could be done manually or by a load sensor script.

For this the resource must be a simple indicator and does not need to be consumable. On toolserver the resource was consumable to limit the number of queries run at the same time to prevent heavy peak usage. I don't know if this is needed on labs, too.

Version: unspecified
Severity: normal

Details

Reference: bz68881

Related Objects
Search...

Status	Subtype	Assigned	Task
Open	Feature	None	T18660 Database table cleanup (tracking)
Declined		None	T87716 Missing rows from categorylinks on production servers (dewiki)
Invalid	Feature	None	T69556 merl tools (tracking)
Declined		None	T70881 Provide resource for db access in grid

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:37 AM

• bzimport added a project: Toolforge.

• bzimport set Reference to bz68881.

Merl created this task.Jul 30 2014, 10:16 PM

coren moved this task from Backlog to Ready to be worked on on the Toolforge board.Nov 25 2014, 4:09 PM

coren removed coren as the assignee of this task.Mar 25 2015, 7:36 PM

coren triaged this task as Low priority.

coren set Security to None.

Aklapper added a project: Cloud-Services.Oct 24 2015, 8:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2015, 8:02 PM

Today dewiki has high replag since about 12 hours (>3hours replag).

Many of my sge jobs are currently testing replag and rescheduling themselves (return code 99) since hours now. This is the recommended behaviors as told by db-admins long time ago. I still think there should be a resource for this.

Unfortunately, we don't have the in-house knowledge to implement and maintain such a custom resource. I think 'check-and-reschedule' is a sane workaround, which has the added advantage it can work with any combination of checks one wishes to use.

Provide resource for db access in gridClosed, DeclinedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Provide resource for db access in grid
Closed, DeclinedPublic
Actions

Related Objects
Search...