Commons:Batch uploading/Fonds Ancely
This upload is part of a partnership between Wikimédia France and the Library of Toulouse. It consists of 2085 public domain files. You may see general notes and work in progress on User:Jean-Frédéric/Ancely.
The metadata is held in a OAI PMH repository. The code explores it and retrieves records ; then if applicable the various fields are matched to a manual alignement of Commons categories and tags, community curated. This is then fed to a data ingestion templates which translates the metadata to {{Artwork}}. Actual upload is made with Pywikipedia-rewrite by User:AncelyBot.
In its current state, the categorisation system with the alignment outputs 31,801 categories (1,694 distinct) − the drawback is that many are high-level categories (“Shawls”, “men”, etc.)
- Ingestion template: User:Jean-Frédéric/Ancely/Ingestion
- Source code: GitHub
- Test file: File:Pyrénées - Jasque Esterlet, guide aux Eaux-Bonnes - Marchande de beurre aux Eaux-Bonnes - Fonds Ancely - B315556101 A PINGRET 014.jpeg
Looking forward your thoughts, Jean-Fred (talk) 22:49, 6 March 2013 (UTC)
Opinions
[edit]- Uploaded five more − see Special:ListFiles/AncelyBot Jean-Fred (talk) 01:14, 16 March 2013 (UTC)
- Uploaded fifteen more − and I will continue uploading files until my demands are met! Jean-Fred (talk) 00:23, 19 March 2013 (UTC)
Support everything looks fine for me. (may be a bit overcat) --PierreSelim (talk) 14:24, 20 March 2013 (UTC)
- Ok, uploading 100 right now. Jean-Fred (talk) 21:06, 11 April 2013 (UTC)
- Looks very good. The only thing that worries me a bit is the number of categories per image. That might become a problem. Please upload more! Multichill (talk) 10:39, 13 April 2013 (UTC)
- Oppose now, we have forgotten to finish the Creator mapping User:Jean-Frédéric/Ancely/Creator --PierreSelim (talk) 12:02, 25 April 2013 (UTC)
- Fixed Jean-Fred (talk) 23:08, 7 May 2013 (UTC)
- Uploaded the first 350. Jean-Fred (talk) 23:08, 7 May 2013 (UTC)
- Uploaded the first 500. Jean-Fred (talk) 13:04, 8 May 2013 (UTC)
- Uploaded the first 800. Jean-Fred (talk) 14:05, 10 May 2013 (UTC)
- Made it 1,000. Jean-Fred (talk) 23:15, 12 May 2013 (UTC)
- Done. 2041 files uploaded + 33 dupes + 11 errors = 2085 files, the size of the corpus. Jean-Fred (talk) 14:49, 24 May 2013 (UTC)
Conclusion
[edit]Dupes
[edit]The following files were already on Commons − we might want to update their file descriptions (current: 33)
Errors
[edit]The following files failed to upload (current: 11)
- Filename too long (5)
- B315556101_A_ESTAMPES_005
- B315556101_A_ESTAMPES_004
- B315556101_A_CARDEL_1
- B315556101_A_DUMEGE_024
- B315556101_A_DUMEGE_032
- Multi-page stuff (2)
- B315556101_A_CARDEL
- B315556101_A_JALON
- The Weird One™ − « AssertionError: XXX has neither 'pageid' nor 'missing' attribute » (6)
- 315556101 A PETIT 3 036 − File:Panorama_des_Pyrénées_Centrales_-_Vue_prise_du_sommet_du_Pic-du-Midi_de_Bigorre_-_Panorama_des_Pyrénées_Centrales_-_Vue_prise_du_sommet_du_Pic-du-Midi_de_Bigorre_-_Panorama_des_Pyrénées_Centrales_-_Vue..._-_Fonds_Ancely_-B315556101_A_PETIT_3_036.jpg
- B315556101_A_HOUBIGANT_045 − [[:File:Lescar,_ancien_1er_évéché_du_Béarn_-_Chapiteaux_qui_existent_dans_l'Eglise_-_Pierre_tumulaire_de_Gui,_évêque_de_Lescar_au_XIIe_siècle_incrustée_aujourd'hui_(19_octobre_1854)_près_la_porte_latérale_de..._-_Fonds_Ancely_-_B315556101_A_HOUBIGANT_045.jpg]]
- B315556101 A DUMEGE 010
- B315556101 A JALON 001
- B315556101 A HOUBIGANT 058
- B315556101 A PETIT 1 007
Categorisation statistics
[edit]Per category
[edit]30266 categories, 1760 distincts Mean: 17.1965909091 Median: 2.0 Max 1045 // Min 1
Top 10: [(u'Mountains in art', 1045), (u'Men in art', 992), (u'Women in art', 878), (u'Trees in art', 780), (u'Houses in art', 736), (u'Pyr\xe9n\xe9es-Atlantiques', 693), (u'Hautes-Pyr\xe9n\xe9es', 617), (u'Pyrenees', 470), (u'National costumes in art', 468), (u'Rivers in art', 440)]
Lose 10: [(u'Estrades', 1), (u'Pierre Bayle', 1), (u'Morla\xe0s', 1), (u'Louis-Fran\xe7ois Couch\xe9', 1), (u'Jean Racine', 1), (u'Faience in France', 1), (u'Marmite', 1), (u'Corsica', 1), (u'Dordogne River', 1), (u'Esera River', 1)]
Per file
[edit]Mean: 14.5160671463 Median: 13.0 Max 47 // Min 0
Top N: [('B315556101_A_LEVASSEUR_066', 47), ('B315556101_A_LEVASSEUR_068', 46), ('B315556101_A_LEVASSEUR_018', 44), ('B315556101_A_LEVASSEUR_056', 42), ('B315556101_A_LEVASSEUR_057', 42)]
Lose N: [('B315556101_A_BERTHIER_010', 1), ('B315556101_A_BERTHIER_024', 0), ('B315556101_A_BERTHIER_021', 0), ('B315556101_A_BERTHIER_018', 0), ('B315556101_A_BERTHIER_013', 0)]
Assigned to | Job | Progress |
---|---|---|
Jean-Frédéric | Metadata pre-processing | Status: Done |
Jean-Frédéric, Symac, Léna, PierreSelim | Metadata alignment | Status: Done |
User:Jean-Frédéric | Upload | Status: Done |
Dupes and errors processing | Status: todo |