Things my team is working on: MediaWiki-Platform-Team
Side projects I am working on (or planning to, eventually): User-Tgr
You can find more info about me on my user page.
User Details
- User Since
- Sep 19 2014, 4:55 PM (511 w, 2 d)
- Availability
- Available
- IRC Nick
- tgr
- LDAP User
- Gergő Tisza
- MediaWiki User
- Tgr (WMF) [ Global Accounts ]
Yesterday
BTW this error occurred when editing a page again right after creating a temporary account, which seems like a bug somewhere in EditPage - there isn't really any reason to autocreate in that situation.
ActorCache seems suspect more generally as well since it has no concept of query flags at all. Here that fails in the more unusual direction where we make a replica read and assume that getting a result means the user is present in the replica (probably the calling code's fault as the interface makes no such contract). But, as far as I see, it could also fail in the opposite direction where we try to read from primary and get a cached value that was the result of an earlier replica read (e.g. during an ActorStore::findActorId() call).
I think I understand what's going on. When Kosta showed this problem earlier, there were two calls to AuthManager::autoCreateUser(); the first actually autocreated, and the second should have followed this code path:
if ( $localId ) { $user->setId( $localId ); $user->loadFromId( $flags ); if ( $login ) { $remember = $source === self::AUTOCREATE_SOURCE_TEMP; $this->setSessionDataForUser( $user, $remember ); } return Status::newGood()->warning( 'userexists' ); }
but loadFromId() loaded from the replica where the user didn't exist yet, so $user ended up as the anonymous user, and `setSessionDataForUser()˙effectively logged the user out.
There is also a separate batch of errors (88 in last 7 days) with a different stack-trace-in-message:
Cannot execute Wikimedia\Rdbms\Database::runOnTransactionIdleCallbacks critical section while session state is out of sync.
There's a bunch of similar tasks:
- T287703: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Cannot execute Wikimedia\Rdbms\Database::rollback critical section while session state is out of sync.
- T317237: Wikimedia\Rdbms\DBUnexpectedError: Cannot execute Wikimedia\Rdbms\Database::rollback critical section while session state is out of sync
- T328043: Wikimedia\Rdbms\DBUnexpectedError: Cannot execute Wikimedia\Rdbms\Database::selectDomain critical section while session state is out of sync.
- T343697: Wikimedia\Rdbms\DBUnexpectedError: Cannot execute Wikimedia\Rdbms\Database::selectDomain critical section while session state is out of sync.
but they have very different stack traces, so I assume this is a problem with application code, not the RDBMS library. Not sure what's the right component for UserEditCountTracker though.
The root problem is that the fact that the Authority methods need a good Status object as an input isn't really documented. Also arguably it's not a very good match of how Status works; more generally, we have a Status object which is a collection of errors and can be merged into other Status objects, but then we create Status subclasses with their own added behavior, which makes the whole merging behavior unreliable. So in the longer term IMO we should either get rid of PermissionStatus, get rid of Status::merge, or replace subclassing with some sort of composition-based mechanism that works with merging (probably not worth the complexity).
authorizeAction() dies if the Status that passed in already has errors (not sure that's useful, but it has been that way since the creation of that class). ApiPurge reuses the same Status object for the purge and linkpurge permission checks (which are for two separate actions, if one fails the check, the other is still done). So presumably this happens when someone uses the forcelinkupdate or forcerecursivelinkupdate option and the purge check gets ratelimited. Has probably been going on for a long time.
Sat, Jul 6
Oh, I see, this is coming from AuthenticationHandler::validate() not AuthenticationHandler::getAuthorizationProvider() (which should probably have a more accurate error message, in any case).
Fri, Jul 5
Thu, Jul 4
The reason for this (very poorly worded) error seems to be that the REST API does not see the grant_type parameter, but I have no idea how adding an extra parameter would cause that.
Failing due to an extra parameter sounds like T360434: REST: request body validation should fail if unexpected fields are present (cc @daniel). It would probably be better for that to either be a warning (I guess that concept would have to be introduced to the REST API first, but it seems useful well beyond this use case) or to follow the B/C break process.
I guess the TitleQuickPermissions hook returns false without actually setting an error?
Wed, Jul 3
Tue, Jul 2
Mon, Jul 1
I have seen other APT-related HTTP errors breaking a puppet run. It happens every once in a while, I don't think we have much control over it.
This will be harder than I thought as there is no Vagrant base box for Bullseye + amd64 + LXC. We'll either have to build our own per https://github.com/fgrehm/vagrant-lxc/blob/master/BOXES.md (there are some extremely old tutorials here and here) or finally migrate off Vagrant (T322991: Consider another orchestration system for Wikispore).
Vagrant in Cloud VPS relies on vagrant-lxc which has been unmaintained for a while and does not work on Debian Bookwork, so the next server upgrade (at our typical pace, two years from now) is going to be the end of life for the current Wikispore infrastructure.
Sun, Jun 30
Fri, Jun 28
I think this is as much as we want to do for now so feel free to re-test. Proper logins still won't work but autologin & co. should.
Thu, Jun 27
Wed, Jun 26
I added a virtual domain parameter to Template:Extension on mediawiki.org. In theory it could be added automatically to the relevant pages via Tool-extjsonuploader, that's not done for now.
Tue, Jun 25
With the patches that have landed, it is now possible to disallow key management other than the removal of keys on a given wiki, so we could prevent new users from enabling WebAuthn while reaching out to existing users and asking them to migrate. So we can call this task done.
Mon, Jun 24
Gerrit 3.10 introduces a more structured way for suggestions: https://www.gerritcodereview.com/3.10.html#allow-fixes-in-human-comments-via-rest-api
Robot Comments have been officially deprecated for some time and the checks API framework is recommended since Gerrit 3.6. This is great as it provides a way of greatly reducing the size of the repository. However, unlike fixes suggested in Robot-Comments, Human Suggested fixes could not be applied programmatically, until now.
Gerrit 3.10 introduces a way of suggesting fixes that could then be programmatically applied with the the Apply Stored Fixes endpoint. This is done by adding an extra field fix_suggestions in CommentInfo that will be stored separately on NoteDB.
(...)
The experiment can be enabled in gerrit.config like so
[experiments] enabled = GerritBackendFeature__allow_fix_suggestions_in_comments
(In 3.9 the bot would instead have to make a normal comment and wrap the code snippet in ```...```. Not sure what the advantages / disadvantages of the new method are.)
Another old task to revisit is T73773: Get rid of lazy-loading of unattached accounts from CentralAuth.
That would probably also be a good time to review the remaining SUL-Finalization tasks.
Thanks for the reminder @Bugreporter, I didn't think of this but it's definitely a blocker for SUL3 (although for the MVP we can leave it on the local wiki).
Once we only use loginwiki to do password or 2FA change, existing accounts not attached to CentralAuth will break.
Adjusting status to more accurately reflect the outcome.
Sun, Jun 23
Actually, I don't think unattached accounts can log in at all. That was disabled way back in rOMWC165ecbfaba66: Set $wgCentralAuthStrict = true;.
There are three relevant autocreation scenarios on Wikimedia wikis:
- autocreation from the session provider in Setup.php, when there is a valid central session cookie but no local session cookie (e.g. you are visiting ab.wikipedia.org while already logged in to en.wikipedia.org)
- autocreation from the primary authentication provider after a successful login
- autocreation from Special:CentralAutoLogin/setCookies on successful autologin or edge login
(There's also autocreation on loginwiki via Special:CentralLogin right after registration, and forced autocreation by another user via special page or script, but the user's global rights are non-existent or irrelevant in those scenarios.)
Sat, Jun 22
Autocreate errors are cached for 5 minutes for performance (since autocreate is attempted on every pageview), so assuming the initial retries are within that time frame, this is working as expected.
Fri, Jun 21
Thu, Jun 20
Per @Seddon, breaking changes to the API should not happen for at a minimum 12, preferably 24 months after the apps were updated for the new behavior. So we'll need to use different defaults for the SUL3 feature flag on the API and web (which on second thought isn't particularly confusing - to API users it will just be exposed as a normal API parameter, they won't know or care that a similar feature flag exists on the web).
Yeah we could do that. Or we could use something like authentication.wikimedia.org.
It's an open bikeshed! Others have recommended suggested "accounts" too. (login.wikimedia.org is already in use.) It's more understandable but also not very accurate since the domain will not be used for account management in general.
Wed, Jun 19
On an aside, can this requirement be documented at Writing an extension for deployment, preferably with an explanation of what "basic support" means? It's disrespective of volunteer contributors' time, to put it mildly, if they only find out at the end of a long development and review process that their extension cannot be deployed for reasons they have no control over.
Mon, Jun 17
The amount of details reported seems pretty minimal: https://zh.wikipedia.org/wiki/Special:%E6%BB%A5%E7%94%A8%E6%97%A5%E5%BF%97/5138031
This is user error (invalid OAuth signature). The problem on our side is that it ends up in the exception channel which should not happen for an ApiUsageException. It should be handled by ApiMain::handleApiBeforeMainException() which AFAICS does not log there (unfortunately since the stack trace reflects where the exception was created, not where it was logged, it's hard to be sure what's going on).
Fri, Jun 14
I'll look into this. IIRC mwv-apt-02 is uses as an APT source by Vagrant; not sure how useful/important it is.
We expect this to be done on time.
Thu, Jun 13
We would have to maintain another mapping (besides domain <=> DB name and domain <=> site/lang) then. Other than that, it's straightforward.