[go: nahoru, domu]

Page MenuHomePhabricator

Cirrus search does not prioritise master pages on their subpages
Open, MediumPublicBUG REPORT

Description

Hi.
I tried to find in the hewiki search box the page "שמש:קיפודנחש", (a talk page of one of the users), and started with "שמש:קיפודנ" waiting for autocomplete. I get a list of the talk page's subpages with archives (<name>/archive/1 and so on), but not the master page (<name>). I believe it should be prioritised on the subpages.
It happened a couple of times last week, on different master pages, and looks like exact continuation of T156840.

  • Enter "שמש:קיפודנ" in the hewiki wikisearch field.
  • Expected "שמש:קיפודנחש" in the autocomplete suggestions list.
  • Got: it wasn't.

Thank you.

Event Timeline

The talk page is ranked very low indeed, it does seem quite recent (created on may 2024) and have 0 incoming_links and thus is far behind https://he.wikipedia.org/wiki/שיחת_משתמש:קיפודנחש/ארכיון_31_עד_מאי_2024 which has more than 3k incoming links. CirrusSearch does not prioritize master pages over their subpages indeed, if we want to do this this would have to be carefully evaluated because one thing we can't do is rank lower a subpage comparatively solely to its master page, all subpages would be down-ranked.

Maybe there is a way to add to the algorithm a paragraph "if you show subsubpage in the line X, but its master page exists and wasn't shown above, then show the master page in the line X+1?"

Other than @IKhitron's suggestion to add the main page to the list when suggesting the sub-page, is anything here that isn't covered by T159861: Add an is_subpage field to elasticsearch documents and use as a scoring feature, which explicitly takes the sub-page status into account? Should we merge the tickets?

Maybe there is a way to add to the algorithm a paragraph "if you show subsubpage in the line X, but its master page exists and wasn't shown above, then show the master page in the line X+1?"

Unfortunately search has no context awareness of these kinds of things. As part of the distributed computation nature of search pages are scored independently and have no knowledge of which pages come before or after them in the ranking. It is possible to tweak the result sets in post-processing, but that gets massively complicated when we start considering pagination. In the past I think the only exception we have made is to inject a result into the top position for exact title matches (maybe, my memory could be fuzzy).

Gehel triaged this task as Medium priority.Mon, Jul 8, 3:31 PM
Gehel moved this task from needs triage to Feature Requests on the Discovery-Search board.
Gehel subscribed.

Implementing this in a way that does not have drawback on other use cases seems far from trivial. It's unlikely that the Search Platform team will find time to work on this in the near future.