[go: nahoru, domu]

Page MenuHomePhabricator

Repopulate testing tool after removing new "instance of" categories
Closed, ResolvedPublic

Description

In the process of testing the image results, we learned that there are a number of page types that should have been filtered out of the API that weren't. Because these page types are likely to have bad results, we don't want them to sway the results of the test, so we should filter them out of the ratings.

The page types that should be removed from the results and from the tool are:

  1. instance of "point in time with respect to current timeframe" (Q14795564)
  2. instance of "century leap year" (Q3311614)
  3. instance of "family name" (Q101352)
  4. instance of: "name" (Q82799)

Once these instances are removed, we will note in this ticket how many ratings in each language we have, so we can decide whether we need to do more ratings to fill the gap.

Note that this ticket is to write a script to remove these from the tool - the changes to prevent the API from returning them will be handled upstream in the API later (see T281680).

Acceptance Criteria:

  • instance of Q14795564, Q3311614, Q101352, and Q82799 are removed from the tool and the ratings results
  • the tool is updated to incorporate more potential matches to fill the gap
  • we've reported on how many results we have in each language both immediately before and immediately after the above are removed
  • we've decided whether we need to do more ratings

Event Timeline

Number of ratings without filtering pages:

select count(distinct resultFilePage),langCode from unillustratedArticles join imageRecommendations on unillustratedArticles.id=imageRecommendations.unillustratedArticleId where rating is not null group by langCode;
+--------------------------------+----------+
| count(distinct resultFilePage) | langCode |
+--------------------------------+----------+
|                            498 | ar       |
|                            560 | bn       |
|                            456 | ceb      |
|                            495 | cs       |
|                            633 | en       |
|                            686 | vi       |
+--------------------------------+----------+

Number of ratings with pages filtered

select count(distinct resultFilePage),langCode from unillustratedArticles join imageRecommendations on unillustratedArticles.id=imageRecommendations.unillustratedArticleId where rating is not null and unsuitableArticleType=0 group by langCode;
+--------------------------------+----------+
| count(distinct resultFilePage) | langCode |
+--------------------------------+----------+
|                            493 | ar       |
|                            556 | bn       |
|                            449 | ceb      |
|                            480 | cs       |
|                            599 | en       |
|                            681 | vi       |
+--------------------------------+----------+

... so we just need a few more ratings to get to our target in ar, cs and ceb

aaaand done

select count(distinct resultFilePage),langCode from unillustratedArticles join imageRecommendations on unillustratedArticles.id=imageRecommendations.unillustratedArticleId where rating is not null and unsuitableArticleType=0 group by langCode;
+--------------------------------+----------+
| count(distinct resultFilePage) | langCode |
+--------------------------------+----------+
|                            500 | ar       |
|                            556 | bn       |
|                            500 | ceb      |
|                            501 | cs       |
|                            599 | en       |
|                            681 | vi       |
+--------------------------------+----------+
Cparle claimed this task.
Cparle moved this task from Doing to Needs QA on the Structured-Data-Backlog (Current Work) board.