Input
- T311745: [EPIC] Section topics data pipeline's output, i.e., a dataset with (page, section title, section topic, relevance score) columns (plus others)
- T299781: [EPIC] Image suggestions backend 's image_suggestions_wikidata_data Hive table
- relevant slices of Structured Data on Commons, depending on T311831: [SPIKE] Consider Structured Data on Commons to gather additional image QIDs
- additional Wikidata image properties, depending on T311832: [SPIKE] Consider other Wikidata properties to gather additional image QIDs
- Commons images from the Data Lake, i.e., wmf_raw.mediawiki_page Hive table
- Image links from the Data Lake, i.e., wmf_raw.mediawiki_imagelinks Hive table
- machine-learned section alignments as per Research's work
Output
Given a wiki, output a dataset with the following data:
- page
- section title
- section visual topic
- Commons image of the visual topic
- other wikis where the image suggestion was found
- relevance signal, basically coverage of the given suggestion across wikis
- explanation signal, i.e., Wikidata property used to suggest the image
This epic keeps track of all the relevant tasks.