[go: nahoru, domu]

Page MenuHomePhabricator

Contention on User::getActorId ?
Open, LowPublicPRODUCTION ERROR

Description

User::getActorId
Lock wait timeout exceeded; try restarting transaction (10.64.0.205)
INSERT IGNORE INTO `actor` (actor_user,actor_name) VALUES (NULL,[USER])

Please look at the relevant log entries at the time: https://logstash.wikimedia.org/goto/ac63a60619438f0c8b16ffc036d89e1a

There are also at the same time:

Column 'rc_this_oldid' cannot be null (10.64.0.205)

This could be just a consequence of something else, or it causing other issues on fast editing?, but I would like someone to give it a look (even if it is closed immediately as a rare event). I find interesting that get causes an insert, but I am guessing it is a non-http GET, and more like a get-or-create if not exists.

I don't think it is a huge issue (lower priority), but I would like someone to give it a look and see if it is a potential contention issue and a place for performance optimization. Latest server versions has more tools to handle concurrency: https://mysqlserverteam.com/mysql-8-0-1-using-skip-locked-and-nowait-to-handle-hot-rows/

Event Timeline

While it would be interesting to figure out the User::getActorId lock wait timeout, I see only 5 in the linked timeframe (and only 6 in the past 24 hours). They seem related to the same Special:Import as the rc_this_oldid error mentioned, but it's not obvious to me how or why. Most seem to be from job runners trying to process CategoryMembershipChangeJobs after the fact, just one is directly from the import.

There are many of the rc_this_oldid error you mentioned, all from the same request using Special:Import. The use of DeferredUpdate is obscuring the source of the insertion, but I'd guess it's probably that a code path can reach ImportReporter.php line 161 with $nullRevId being null (which should be fixed somehow, the method called on line 161 doesn't accept null). Based on other log messages in the same request, that seems like it in turn was probably caused by the same failure-to-load that's behind T205675 (see T205675#4723071 in particular) causing $nullRevision to not be created.

It is ok to close it if it is a duplicate or you think is unlikely to happen again or is a very rare occurence. I just report when I see something out of the ordinary on the logs FYI, but lack the knowledge of a deep analysis.

There are two parts here:

  1. Fix ImportReporter to handle $nullRevId being null in some appropriate manner.[1] This task could be used for that, or we could make a new one.
  2. Figure out what's going on with T205675 to fix the underlying cause. That should be done in that task.

[1]: @#core-platform-team: I don't know what would be best here: passing 0 to mean "no revision" in the RC entry, loading the ID of the top revision of the page in the absence of a null revision, or throwing an exception due to the failure to create the null revision.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM
Aklapper changed the task status from Stalled to Open.Oct 19 2020, 4:34 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task needs retesting, then the TestMe tag should be added. If this task is either out of scope and nobody should ever work on this, or nobody else managed to reproduce the problem described in this task, then this task should have the "Declined" status. If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)

@PlatformEng Note my previous comment at T227739#5327312, as the original reporter, which are in line with @Aklapper comments.