Image suggestion evaluation August 2020
Closed, ResolvedPublic
Actions

Description

In the parent task (T256081), @Miriam generated lists of image recommendations for six languages. In this task, the following people will evaluate the recommendations in the lists:

I put the files in tabs in this spreadsheet: https://docs.google.com/spreadsheets/d/120ux_OPnqGWwrufgAvoBFBqDiPGquK4Xgd4UevLFuu0/edit#gid=778067505 (it is also possible to view all top images with their articles at once via this link). In our first pass, we will evaluate the first 50 articles in each list. I sorted the articles randomly so we are evaluating a representative group.

We'll classify the "top. image" into these categories, along with explanatory comments where useful:

Classification	Explanation
2	Great match for the article, illustrating the thing that is the title of the article; e.g. the article is "Food" and it is an image of food.
1	Good match, but difficult to confirm for the article unless the user has some context, and would need a good caption; e.g. the article is "Food" and it is an image of a famous chef.
0	Not a fit for the article at all; e.g. the article is "Food" and the image is a car.
-1	Image is correct for the subject, but does not match the local culture; e.g. the article is "Food" and the image is a specific food from a specific culture that is not recognizable in the local culture.
-2	Misleading image that a newcomer could accidentally think is correct; e.g. the article is "Taco" and the image is a burrito.

Details

Due Date: Aug 27 2020, 7:00 AM

Related Objects
Search...

Status	Assigned	Task
Resolved	CBogen	T254768 [EPIC] Image recommendations proof-of-concept phase
Resolved	Miriam	T256081 Image matching algorithm
Resolved	MMiller_WMF	T260857 Image suggestion evaluation August 2020

Event Timeline

@revi @Trizek-WMF @PPham @Dyolf77_WMF @Urbanecm -- we're ready to start on this task, like we discussed in our meetings this week. Let's evaluate the first 50 article on each of our sheets. Please post any questions here. And when you're finished, please post a comment here leaving any overall notes or reactions to the algorithm.

MMiller_WMF set Due Date to Aug 27 2020, 7:00 AM.Aug 19 2020, 9:07 PM

MMiller_WMF moved this task from Incoming to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.

Urbanecm added a project: User-Urbanecm.Aug 19 2020, 10:06 PM

Dyolf77_WMF added a project: User-Dyolf77.Aug 20 2020, 11:57 AM

Urbanecm_WMF edited projects, added User-Urbanecm_WMF; removed User-Urbanecm.Aug 21 2020, 11:24 AM

Urbanecm_WMF updated the task description. (Show Details)

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Urbanecm_WMF updated the task description. (Show Details)Aug 22 2020, 4:58 PM

Urbanecm_WMF updated the task description. (Show Details)

Some general comments after finishing the evaluation:
Most of the suggestions are on animals/flowers, astronomical bodies, biographies and geographical locations in my case:

Animals/flowers: Mostly wrong suggestions as it suggests image of species in the same genus or family but they are different species.
Astronomical bodies: This is hard to evaluate since my knowledge about this subject is narrow, but in my impression none in this group is a "2".
Biographies: Has a high rate of correct suggestion, perhaps because you can just match the name of the article with file's name
Geographical locations: has the highest rate of correct suggestion, same reason with biographies maybe.

If you want to confirm whether it fits or not, you have to actually follow the file to other languages, read the articles, google search the subject in some cases... overall most of the evaluation would require more than two clicks. Some of them would not be suitable for newcomers - it could be hard for them to navigate through different languages and projects (wikidata, wikispecies, wikicommons, etc.).

Of course it is a promising idea, but let's not forget that the newcomer is not as dedicated to Wikipedia as us, so perhaps they won't want to waste so much time clicking around, but instead want a more immediate and direct result/edit? But this type of suggestion is definitely valuable to us experienced users lol.

• Charlotte moved this task from Needs Triage to Tracking on the Wikipedia-Android-App-Backlog board.Aug 24 2020, 3:33 PM

Dyolf77_WMF moved this task from Backlog to In review on the User-Dyolf77 board.Aug 24 2020, 4:20 PM

Finished the evaluation of ar.wiki suggestions. Suggestions can be improved by:

hiding low quality/resolution images (they are used as icons),
hiding images with DR (Deletion Request) tags,
looking for local images in Other files section (especially for translatable maps/diagrams).

Trizek-WMF updated the task description. (Show Details)Aug 26 2020, 2:05 PM

Urbanecm unsubscribed.Aug 26 2020, 2:08 PM

In T260857#6402309, @Urbanecm_WMF wrote:

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Thank you Martin! This is really helpful!

Concerning the samples we have to review, after a meeting with Revi and Phuong, we all agreed on the lack of diversity. For instance, Phuong and I have a lot of asteroids, Phuong also says that she has a lot of animals. It doesn't seems to be well balanced. @Miriam, is it normal to have this lack of diversity?

In T260857#6412679, @Trizek-WMF wrote:

In T260857#6402309, @Urbanecm_WMF wrote:

Hopefully you all aren't that far: because I'm lazy, I built a script that generates a page that shows both page title and the image in one place. You can view your own generated file at https://people.wikimedia.org/~urbanecm/growth-team/image-evaluation-aug-2020-T260857/. Hope this helps!

Thank you Martin! This is really helpful!

I'm glad you like it! A code author is always happy to hear someone used it and liked it ;).

Concerning the samples we have to review, after a meeting with Revi and Phuong, we all agreed on the lack of diversity. For instance, Phuong and I have a lot of asteroids, Phuong also says that she has a lot of animals. It doesn't seems to be well balanced. @Miriam, is it normal to have this lack of diversity?

Maybe I'm highly tolerant to lack of diversity, but I don't feel any lack of diversity in my set of images.

Hi all. A quick note that Miriam will be back at work on 2020-09-07 and will likely not be able to respond to your questions until then.

Urbanecm_WMF moved this task from Untriaged to Waiting on review on the User-Urbanecm_WMF board.Aug 27 2020, 8:29 AM

Urbanecm_WMF updated the task description. (Show Details)

@Trizek-WMF I'm done with the first 50 items. Let me know if there is anything more I should do.

We are all done with our items (Habib helped me for French). I let Marshall check on the results and finish his batch.

Is it possible to refine by section title to illustrate a section? For instance, an article about a city in which there is a "tourism" section: is the recommendation tool able to find a picture of the local beach (for example) for this section?

@revi @Urbanecm -- could you please put your overall comments on the results here?

@Miriam -- we are finished evaluating this round of image recommendations. For six languages, we evaluated 50 random matches. The results are in the sheets in this workbook: https://docs.google.com/spreadsheets/d/120ux_OPnqGWwrufgAvoBFBqDiPGquK4Xgd4UevLFuu0/edit#gid=383843253. There is a graph of results in the "Summary" tab.

We classified the matches into these classifications:

Classification	Explanation
2	Clear match for the article, illustrating the thing that is the title of the article; e.g. the article is "Food" and it is an image of food.
1	Appropriate match, but difficult to confirm for the article unless the user has some context, and would need a good caption; e.g. the article is "Food" and it is an image of a famous chef.
0	Not a fit for the article at all; e.g. the article is "Food" and the image is a car.
-1	Image is correct for the subject, but does not match the local culture; e.g. the article is "Food" and the image is a specific food from a specific culture that is not recognizable in the local culture.
-2	Misleading image that a newcomer could accidentally think is correct; e.g. the article is "Oak tree" and the image is an elm tree.

Here are the toplines:

Depending on the wiki, 20-40% of matches were 2s. We have to talk about what accuracy level we think is minimal for the newcomer experience.
The number of 1s calls into question many design challenges: how much information can we give users to investigate these matches? For instance, if it's the article on Albert Einstein in Arabic Wikipedia, and the suggested image is of Albert Einstein's childhood home, and the name and description of that image are in German, how can the user determine that it's a good match, and not just a random house?
Depending on the wiki, -2s could be up to 30% of the matches. These seem to be caused largely by this phenomenon: there is an unillustrated article about a specific butterfly species (or asteroid or whatever). Some other wiki has an article about that butterfly, and it has a "butterflies" navbox at the bottom, which uses a certain butterfly image. All butterfly articles in that wiki therefore have that one butterfly image at the bottom, and so it is erroneously recommended for the wrong species.
The specific sheets for each language contain many notes that will help us refine this algorithm. I recommend reading through all of them to discover patterns that can be addressed.
There are also comments above, in this task, listing some clear areas for improvement from the people who did the evaluation.

Here is the graph from the "Summary" tab:

MMiller_WMF mentioned this in T256081: Image matching algorithm.Sep 8 2020, 10:10 PM

MMiller_WMF updated the task description. (Show Details)Sep 9 2020, 8:37 PM

MMiller_WMF mentioned this in T263374: Image suggestion evaluation September 2020.Sep 21 2020, 4:02 AM

Dyolf77_WMF moved this task from In review to Ended on the User-Dyolf77 board.Sep 21 2020, 9:48 AM

Urbanecm_WMF moved this task from Waiting on review to Done on the User-Urbanecm_WMF board.Oct 16 2020, 12:50 AM

Urbanecm unsubscribed.Oct 23 2020, 5:41 PM

@MMiller_WMF: Hi, the Due Date set for this open task is more than two months ago.
Can you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks.

CBogen added a project: Image-Suggestions.Dec 9 2020, 10:06 PM

leila moved this task from FY2020-21-Research-July-September to FY2020-21-Research-October-December on the Research board.Jan 4 2021, 7:31 PM

leila edited projects, added Research (FY2020-21-Research-October-December); removed Research (FY2020-21-Research-July-September).

We've finished with this task, and we're moving on to continuing the work in other image tasks. Thank you!

	F32251618: image.png
	Sep 8 2020, 10:09 PM

Image suggestion evaluation August 2020Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Image suggestion evaluation August 2020
Closed, ResolvedPublic
Actions

Related Objects
Search...