We got paged for shellbox unavailability again today (previously tracked in T310557), and this time concluded (based on Shellbox\ShellboxError entries in logstash) that all the shellbox requests were associated with a single score edit: https://en.wikipedia.org/w/index.php?title=Pictures_at_an_Exhibition&diff=1096844863&oldid=1096808949&diffmode=source Note the shellbox activity lasted for about 23 minutes, and the edit timestamp is at the end of that interval.
The reason appears to be background parsing associated with VisualEditor. The MWExtensionDialog as used in Score has the default 0.25s debounce preview, meaning we're shelling out to Lilypond through Shellbox every quarter-second while the user is typing -- regardless of whether an existing shellout is in flight. That's reasonable for lots of parsing applications that take much less time than that, but for something as heavy as these score parses, we should extend that interval, which would have the effect of cutting down on the request rate to shellbox.
(One point of clarification: I originally thought the debounce value meant "we'll parse after the user stops typing for 250 ms." That still seemed like something we'd plausibly run into here: I imagined a user with a musical score on their desk, transcribing one or two notes at a time and then looking back at the score. But @Legoktm experimented with typing continuously, only a little slower than normal, and generated 400 requests to shellbox -- so it seems like these may be fired off every 250 ms, whether the user is still typing or not.)
We might also want to experiment with changing from a fixed rate limit to a fixed concurrency limit: that is, start a parse only if another parse isn't already in progress. That would limit the effect on the infrastructure: no matter how fast or slow a user types, their background parses tie up at most one shellbox replica at a time. (We'd want those requests to have a fairly short deadline, so that if the parse fails for whatever reason, it doesn't mean that background parsing stops for a full 60 seconds, or worse, indefinitely.)
@TheresNoTime also noted Real Time Preview will have a similar phenomenon, tracked separately at T312318.
Thanks to @jhathaway @Krinkle @Legoktm @Perryprog @TheresNoTime for their work digging into this.