[go: nahoru, domu]

Page MenuHomePhabricator

MediaWiki uploads files to Swift in eqiad and codfw in serial, not parallel
Open, Needs TriagePublic

Description

On Wikimedia wikis, we have MediaWiki upload copies of files to both the eqiad and codfw datacenters. Despite SwiftFileBackend using MultiHttpClient, these uploads are done one after another instead of in parallel. If they were done in parallel, it would cut down on the overall time and help with large file uploads completing in time.

Event Timeline

This seems to be on purpose, twice over. FileBackendMultiWrite::doOperationsInternal() does the master write first, and then proceeds with the replica write only if the master write succeeded. This is an effort to keep them consistent. Maybe overkill? It's not like the primary write is reverted if the replica write fails, so they won't be consistent anyway if the data stores are unreliable.

The write to the replica can be deferred to post-send, but that is explicitly disabled in production with 'replication' => 'sync'.

I think the intent was to have better consistency for doOperations() calls with only a single-step file operation (e.g. not move for swift). If a PUT request fails in the master backend, then not trying it on the second backend keeps things consistent. This doesn't cover the edge case of a non-answer (e.g. timeout or 503) where the objects where ultimately saved.

I don't see a fundamental reason that it can just be totally concurrent, as long as all the assessibility/precheck are still done first as normal. I suppose that resulting status could just come from the master backend results.

@tstarling @Bawolff If either of you want to take this on, it seems like a worthwhile task. I could help with review.