[M] Another round of manual evaluation for SLIS
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	CBogen
	Feb 28 2023, 6:33 PM

Description

Based on the first round of section-level image suggestions evaluation results, we decided to do more work to remove image suggestions for sections with tables and lists (T330841, T330848), remove image suggestions for short sections (T329282), and remove image suggestions for sections that already have an image (T330516).

We also decided to do more work to refine P18, P373, and lead image based suggestions (T330773).

After those tickets are done, this ticket is to do another round of internal and ambassador manual evaluation using https://section-image-suggestions-test.toolforge.org/ -- but this time we would like include the number and % of clicks on "this section should not have an image".

Acceptance Criteria

Update the test data with the results of T330516, T329282, T330841, T330848, and T330773
Include the option "This image is offensive" in addition to "Good", "Bad", and "This section shouldn't have an image"
Run another round of evaluation using https://section-image-suggestions-test.toolforge.org/ with updated data
The outputs should include:

wiki

% good intersection

% good alignment

% good P18/P373/lead image

% sections that should not have an image

% offensive images

total rated suggestions

"% good" should mean the % of the total rated suggestions rated good, *not* the % of those that don't include "sections that should not have an image" or "this image is offensive" -- therefore, ratings for "this section should not have an image" and "this image is offensive" should be counted in the total rated suggestions time

Instructions for ambassadors:

Evaluate 500 random section-level image suggestions across 500 random different articles, per wiki.
Ambassadors will need to count and keep track of how many suggestions they have evaluated in their language -- the tool will not capture that.
For each result for each unillustrated article, manually decide whether the match is good or bad; alternatively note if the image is offensive or if the section should not have an image. You can also choose "unsure" if you are not confident in your selection.
General comments or questions during evaluation can be posted as comments in this ticket.
The estimated time of work for manual evaluation is 3 hours for the 500 images. However, if the 3 hours are passed without finishing the test, please leave a comment and we can decide whether to continue with further evaluation.

Results

wiki	% good alignment	% good intersection	% good p18/p373/lead image	%sections that should not have an image	% offensive images	total rated suggestions
arwiki	71	91	54	6	0.4	511
bnwiki	28	86	26	24	0	204
cswiki	41	77	23	13	0	128
enwiki	76	96	75	3	0	75
eswiki	60	67	48	27	0.2	549
frwiki	N.A.	N.A.	100	N.A.	N.A.	3
idwiki	66	81	37	37	0	315
ptwiki	92	100	84	0	0	85
ruwiki	73	89	69	4	0	250
overall	64	86	57	14	0.07	2,120

NOTE: total rated suggestions exclude those marked as unsure.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		None	T311814 [EPIC] Section-level image suggestions data pipeline
		Resolved		mfossati	T330784 [M] Another round of manual evaluation for SLIS

Event Timeline

CBogen created this task.Feb 28 2023, 6:33 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2023, 6:33 PM

CBogen added a parent task: T311814: [EPIC] Section-level image suggestions data pipeline.Feb 28 2023, 6:34 PM

mfossati subscribed.Mar 1 2023, 9:16 AM

CBogen updated the task description. (Show Details)Mar 1 2023, 4:03 PM

CBogen mentioned this in T330773: [L] Make refinements to and incorporate P18 based section-level image suggestions.Mar 1 2023, 4:52 PM

CBogen edited projects, added Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.Mar 6 2023, 5:45 PM

CBogen moved this task from Incoming to Ready for Estimation on the Structured-Data-Backlog (Current Work) board.

CBogen updated the task description. (Show Details)Mar 14 2023, 5:10 PM

Blocked on T330516, T329282, T330841, T330848, and T330773

CBogen renamed this task from [M] Another round of manual evaluation to calculate how many sections that shouldn't have images still generate suggestions to [M] Another round of manual evaluation for SLIS.Mar 23 2023, 2:07 PM

CBogen updated the task description. (Show Details)

CBogen mentioned this in T330852: Skip p18-based image suggestions from newcomer tasks.Mar 23 2023, 2:11 PM

CBogen updated the task description. (Show Details)Mar 30 2023, 1:18 PM

CBogen updated the task description. (Show Details)

CBogen updated the task description. (Show Details)Apr 11 2023, 1:08 PM

CBogen updated the task description. (Show Details)Apr 11 2023, 1:13 PM

KStoller-WMF mentioned this in T329275: Section-level images: Create task type.Apr 13 2023, 8:50 PM

Unblocking this to tackle the tool change (see second AC) first.

mfossati updated the task description. (Show Details)Apr 17 2023, 3:16 PM

mfossati updated the task description. (Show Details)Apr 20 2023, 1:20 PM

Data and tool ready, evaluation elicited. Moving to QA for the evaluation period ( @Etonkovidova, you can safely skip this ticket).

Dyolf77_WMF subscribed.Apr 20 2023, 3:45 PM

BAPerdana-WMF subscribed.Apr 25 2023, 3:08 PM

When testing the section image suggestions in Spanish, some of the suggestion are being repeated after completing them before (and not skipping them). For example, this is one of the articles/section repeated and the image suggested is always the same.

mfossati updated the task description. (Show Details)May 2 2023, 3:35 PM

KStoller-WMF subscribed.May 2 2023, 5:59 PM

@mfossati thanks for posting the results! Is there a spreadsheet of raw data I can look at too?
Am I correct in interpreting that "intersection" suggestions are both a topic and alignment fit?

mfossati updated the task description. (Show Details)May 2 2023, 6:37 PM

mfossati updated the task description. (Show Details)May 2 2023, 7:00 PM