Commons:Bots/Work requests

This is a page for requesting work to be done by a bot. This is an appropriate place to simply put ideas for bots. However be aware of various tools available to all users which can be used to accomplish the work without the need for a bot:

Gadget Cat-a-lot can be used for most category adding, removing and moving
VisualFileChange tool can be used for mass-changes of one author's uploaded files, or files in a category, creating a mass-deletion-request, let you insert tags to the file-description-pages (even copy from exif/meta-data). You can also perform find and replace operations with or without JavaScript regular expressions (regexp). A common use case is: "I've changed my username". It's a web-tool and can be launched directly.
AutoWikiBrowser can be used for large number of supervised and automatic edits
Commons:Batch uploading page can be used to request mass image uploads

Please add {{Section resolved|1=~~~~}} when request is resolved.

	SpBot archives all sections tagged with {{Section resolved\|1=~~~~}} after 7 days.

#	Bot request	Status	💬	👥	🙋 Last editor	🕒 (UTC)	🤖 Last botop editor	🕒 (UTC)
1	Hidden categories added as Category:Hidden categories		11	4	Enhancing999	2024-08-05 01:21	Fl.schmitt	2024-08-04 07:23
2	sorting files	Resolved	13	3	DaxServer	2024-07-28 17:08	DaxServer	2024-07-28 17:08
3	Revert additions to Category:History by Mitte27		7	5	Enhancing999	2024-06-30 11:06	Cryptic-waveform	2024-06-25 13:04
4	Convert Category:Photographs by Carol M. Highsmith to JPEG		28	7	DaxServer	2024-08-03 16:46	DaxServer	2024-08-03 16:46
5	Cities in Finland and China by month		2	2	DaxServer	2024-08-02 21:26	DaxServer	2024-08-02 21:26
6	Remove extraneous "I, " in author param of PD-self		10	5	Pelikana	2024-08-04 21:44	Jeff G.	2024-07-24 10:21
7	Images with borders (MTC)		4	2	DaxServer	2024-07-29 13:57	DaxServer	2024-07-29 13:57
8	Missing "-" in coordinates (MX)	Done	2	2	DaxServer	2024-08-01 12:08	DaxServer	2024-08-01 12:08
9	Add OCR output to jpg		3	2	Enhancing999	2024-08-04 13:47
10	Move "Historical images of" to "History of"		9	4	Adamant1	2024-08-10 07:27	Jeff G.	2024-08-04 10:20
11	P625 to P1259		5	3	Multichill	2024-08-13 17:21	DaxServer	2024-08-06 10:58
12	Media missing infobox template		22	4	Fl.schmitt	2024-09-04 18:31	Fl.schmitt	2024-09-04 18:31
13	Auto-addition of inferrable categories		2	1	Prototyperspective	2024-08-30 12:50
14	Generate a daily database report equivalent of Special:UncategorizedCategories		16	4	Prototyperspective	2024-08-26 14:16	Fl.schmitt	2024-08-26 12:36
15	file description cleanup: "Uploaded with Reworkhelper"		1	1	Enhancing999	2024-08-27 13:46
16	Rename and move files from category	Resolved	6	2	Fl.schmitt	2024-08-29 17:10	Fl.schmitt	2024-08-29 17:10
17	Add P1651 YouTube video ID structured data from "source" attribute of Filedesc template		8	4	Jeff G.	2024-09-04 21:58	Jeff G.	2024-09-04 21:58
18	Add missing Template:Location		7	3	Enhancing999	2024-09-01 18:55	DaxServer	2024-09-01 18:50

Legend
In the last hour
In the last day
In the last week
In the last month
More than one month
Manual settings
When exceptions occur, please check the setting first.

Hidden categories added as Category:Hidden categories

Latest comment: 1 month ago11 comments5 people in discussion

Hidden categories is a system category added by __HIDDENCAT__

However, some files and even categories add it as regular categories: [[Category:Hidden categories]]

To find some: [1] (currently 468 in category namespace). Enhancing999 (talk) 13:22, 2 June 2024 (UTC)Reply

I've reduced the numbers with Com:Cat-a-lot. The rest probably should be gone through manually. Jonteemil (talk) 23:04, 3 June 2024 (UTC)Reply

Shouldn't they be replaced with __HIDDENCAT__? This finds those lacking that. Enhancing999 (talk) 23:18, 3 June 2024 (UTC)Reply

I'm not sure all 128 categories really should be hidden. That's why I suggest they be gone through manually. Jonteemil (talk) 11:52, 4 June 2024 (UTC)Reply

Currently 54 hits.

Support fixing this. [[Category:Hidden categories]] should NOT appear. — Preceding unsigned comment added by Taylor 49 (talk • contribs) 14:12, 26 June 2024 (UTC)Reply

I think is done now. I've edited most of the remaining 43 categories using AWB. I was unsure about Category:Vector files with non-modifiable text since there, Category:Hidden categories is used as piped link.

{{Section resolved|Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)}} Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)Reply

Thanks for the help. I had done a few as well. While doing the change manually helps adding more precise categories (like {{Source category}} or {{Usercat}} ) . I don't see an issue with systematically converting all uses going forward. Since July 14, a new use has been added: [2]. Maybe a bot that runs daily could include it too. Enhancing999 (talk) 11:54, 16 July 2024 (UTC)Reply

Oops - sorry for the misunderstanding - i stopped reading too early :-) ... I've left a message on the user's talk page. Fl.schmitt (talk) 07:23, 4 August 2024 (UTC)Reply

I asked R'n'B to include it in Russbot's tasks and cleaned up some of the noise in [3]. Enhancing999 (talk) 07:33, 4 August 2024 (UTC)Reply

This doesn't need a bot. You can easily locate pages that have been added manually to Category:Hidden categories; see, e.g., https://quarry.wmcloud.org/query/85343. And I agree with Jonteemil that these pages should be reviewed manually; a bot has no way of knowing whether hiding the category is in fact appropriate. --R'n'B (talk) 23:52, 4 August 2024 (UTC)Reply

The question is not if we can find them, but if we want fix them manually. We can easily empty category redirects manually too, but we don't really want to.

Did you find any in the 500 that needed manual review? Enhancing999 (talk) 01:21, 5 August 2024 (UTC)Reply

sorting files

Latest comment: 1 month ago12 comments3 people in discussion

Please help me sort files in the subcategories of Category:Photographs in the Golestan Palace Library by number. Sortkeies should be in three digits as there might be more than a hundred files in each album. Hanooz 15:18, 7 June 2024 (UTC)Reply

@Hanooz: this seems to be done, too - is this correct? If not, please comment. Thank you!

Section not resolved| (talk) 10:36, 14 July 2024 (UTC)}} Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)Reply

It's not, unfortunately. Hanooz 16:10, 14 July 2024 (UTC)Reply

OK, i've removed the resolved template (sorry, i didn't understand first that you want to sort the files inside the subcategories, not into the categories...). --Fl.schmitt (talk) 17:07, 14 July 2024 (UTC)Reply

@Hanooz Is this the format - [4] [5] ? -- DaxServer (talk) 19:39, 14 July 2024 (UTC)Reply

008.2 (or 008-2) for File:Golestan Palace Album No. 100-8.2.jpg and 008.1 (or 008-1) for File:Golestan Palace Album No. 100-8.1.jpg. What comes after the dot (1 or 2) is recto/verso. Hanooz 19:59, 14 July 2024 (UTC)Reply

@Hanooz Here is what I gather: https://commons.wikimedia.org/w/index.php?title=User:DaxServer/sandbox&oldid=899125871 from Petscan https://petscan.wmcloud.org/?psid=28923652 I omitted the first few which do not have the pattern "Golestan_Palace_Album_No._" in the title. Please edit them manually setting the desired sortkey. If the table looks good, I can file for the bot and can do the edits. Let me know -- DaxServer (talk) 13:58, 15 July 2024 (UTC)Reply

Looks great to me. Thanks. Hanooz 16:00, 15 July 2024 (UTC)Reply

Filed Commons:Bots/Requests/DaxBot (5) -- DaxServer (talk) 20:46, 15 July 2024 (UTC)Reply
@Hanooz I believe the sorting is done. Can you verify and mark this resolved? -- DaxServer (talk) 13:33, 28 July 2024 (UTC)Reply
Yes. Thank you for your assistance. Hanooz 14:07, 28 July 2024 (UTC)Reply
Pleasure! -- DaxServer (talk) 17:08, 28 July 2024 (UTC)Reply

Resolved

Revert additions to Category:History by Mitte27

Latest comment: 2 months ago7 comments5 people in discussion

Thousands of uncategorized files were added to the already-bloated Category:History. All of the edits I find were on 31 May 2024. Could some please automatically revert these edits? Thanks. Cryptic-waveform (talk) 20:55, 24 June 2024 (UTC)Reply

I don't think it's a good idea to return it. My idea was to then move the files from "Category:History" to more specific categories. --Mitte27 (talk) 09:59, 25 June 2024 (UTC)Reply

The current status is that thousands of files that were correctly marked as Uncategorized, and therefore easily visible to contributors doing a first round of categorization, are now erroneously categorized in a top-level category. Cryptic-waveform (talk) 13:04, 25 June 2024 (UTC)Reply

@Mitte27: so when do you plan to move the images to more specific categories? This is clearly not an indefinite solution. —Matrix(!) {user - talk? - _uselesscontributions} 18:55, 26 June 2024 (UTC)Reply

I sorted out some photos related to the history of Russia/USSR, but I have little understanding of American history, and most of the photos in the category are related to it. In any case, this category is better than none. --Mitte27 (talk) 22:29, 26 June 2024 (UTC)Reply

There is no reason to ever place files into extremely broad categories like Category:History. Please do not remove {{Uncategorized}} unless you are able to either accurate place a file in the most specific categories available or into a dedicated cleanup category. Pi.1415926535 (talk) 00:22, 27 June 2024 (UTC)Reply

You could just use cat-a-lot. I don't think adding all LOC or NARA images to "History" by default is a good idea. Enhancing999 (talk) 11:06, 30 June 2024 (UTC)Reply

Convert Category:Photographs by Carol M. Highsmith to JPEG

Latest comment: 1 month ago28 comments7 people in discussion

Category:Photographs by Carol M. Highsmith is an excellent Library of Congress collection of very good images. Unfortunaly, all those images are in TIFF format, which means that the average file size is 100-300 MB, which is incredibly large. It causes long loading times of even the preview image (let alone the actual file), and TIFF file format is not supported by most browsers or general applications. Wikipedia discourages using TIFF files for those reasons, and this reduces the likelyhood of those excellent images being used.

Therefore, some bot should convert those TIFFs to JPEGs, copy the descriptions/categories and make sure the files reference each other. Further, the categories from the TIFF files should be replaced with Category:LC TIF images with categorized JPGs TheImaCow (talk) 21:59, 30 June 2024 (UTC)Reply

@TheImaCow Thanks for finding this. I've filed for a bot Commons:Bots/Requests/ImageConverterBot -- DaxServer (talk) 15:13, 1 July 2024 (UTC)Reply

I didn't expect someone to reply to this so quick, thank you!

I came across this series via Category:Aerial photographs of the United States and subcats, which contains many poorly categorized images from this collection. TheImaCow (talk) 16:40, 1 July 2024 (UTC)Reply

LCCN2013631230.tif shows a jpg and several jpg-sizes are offered. Is this really needed? Enhancing999 (talk) 18:44, 1 July 2024 (UTC)Reply

Hmm, I didn't notice that. It seems it is not necessary after all -- DaxServer (talk) 20:56, 1 July 2024 (UTC)Reply

Maybe I'm blind, but where are those files offered? It's not the "Download/Use this file/Email a link" bar, all resolutions there only download the same low-quality preview generated by the Mediawiki software (which is shown on the file description page) TheImaCow (talk) 21:19, 1 July 2024 (UTC)Reply

Below the image, there is a line:

The last one matches the tiff. Enhancing999 (talk) 21:52, 1 July 2024 (UTC)Reply

Oh thanks I see. However this is very obscure and when embedding the file anywhere, it will always refer to the TIF version - so an seperate JPG should probably still be uploaded, like the 220,000 other TIF files in Category:LC TIF images with categorized JPGs (or the 58,000 Category:NARA TIF images with categorized JPGs)

But I don't have strong opinions on this. TheImaCow (talk) 22:11, 1 July 2024 (UTC)Reply

Loading a file to test this -- DaxServer (talk) 05:51, 2 July 2024 (UTC)Reply

Possibly support for tiffs was less developed when they were uploaded. I wonder how all those thousands of duplicates are curated and how much volunteer time is lost by handling two instead of just one copy of every image. WMF recently expressed their view on hosting files on Commons that aren't used on WMF sites [6]. Enhancing999 (talk) 09:22, 2 July 2024 (UTC)Reply

Well, in theory, those TIF duplicates shouldn't need any curation, as they are supposed to be dumped into the massive categories mentioned above, and only linked from the description of the maintained JPG version.

The use of TIF is something I think is generally not needed for 99.9% of files, modern compression is more than good enough.

(I don't oppose eventually getting rid of the TIF duplicates, but there is not even consensus to delete de-facto duplicates where one version is rotated differently by single degrees, or random low quality TIF scans of generic text documents, where the same scans are also uploaded as JPG, so forget it) TheImaCow (talk) 13:28, 2 July 2024 (UTC)Reply

Oddly, I can't figure out which one of the two maps is correct ;) Did you nominate the wrong one? For the text ones, I'd have nominated the jpg ones. The assumption that deletion doesn't save anything is incorrect: deletion reduces curation (even if theoretically none is needed, it still happens and wastes volunteer time), limits spamming of Special:search, can even save storage space as files can be purged (from non-public view) or wont be exported twice when requested.

As technology changes, I think views on this evolve. NARA's approach might have been the ideal 15 years ago, but other GLAMS that started only more recently use different approaches. Enhancing999 (talk) 12:13, 3 July 2024 (UTC)Reply

Not sure what you mean. Both maps are exactly the same. JPG ones nominated instead? Ideally someone uploads a PDF and 307 files are replaced with one in the correct format for documents. I never said I oppose deletions, I said the exact opposite.

The NARA approach has actually changed - there have been at least two bulk uploads, one in 2011 and the other 2019.

The 2011 one uploaded nearly every image twice - one TIF+one JPG. The 2019 one uploaded only JPGs.

Looking at the NARA catalogue, files uploaded earlier have often TIF,JPG and sometimes GIF versions for download. Images uploaded 2019, presumably digitized later, have only high-resulution JPGs for download. TheImaCow (talk) 18:33, 3 July 2024 (UTC)Reply

It's better to have the lossless files than a JPEG, as you can always make a JPEG from a lossless file, but you can't make a lossless from a JPEG. Still, while we shouldn't delete the TIFFs, we should make JPEG options. Adam Cuerden (talk) 08:52, 4 July 2024 (UTC)Reply

If we want to offer lossless files in a reasonable sizes (2MB vs 200MB), we might want to consider offering PNGs instead of JPEGs -- DaxServer (talk) 08:57, 4 July 2024 (UTC)Reply

@DaxServer: Please don't, PNG images look fuzzy when scaled down (due to design decisions discussed in phab:T192744) on WMF projects. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 12:59, 13 July 2024 (UTC)Reply

What should be done now? Is there any reason to not do what has been already done successfully with hundreds of thousands of NARA/LOC files? TheImaCow (talk) 11:57, 24 July 2024 (UTC)Reply

What leads you to describe it as "successful"? How many edits had to be made because we have the same file twice? Enhancing999 (talk) 12:02, 24 July 2024 (UTC)Reply

"successful" because the TIF versions are dumped into the massive TIF categories linked above and linked in the "other_versions=" parameter at the information template of their respective JPG version, in case anyone needs them. JPG versions are maintained, TIF versions are just there. And there hasn't been much of an issue with as far as I can tell.

Please show some nessescary manual edits that had to be done twice, because when done right, there aren't any. TheImaCow (talk) 17:02, 24 July 2024 (UTC)Reply

Any edit on the version you consider secondary is as waste of curation energy. Wouldn't professionally managed archives clean this up beforehand rather than waste our volunteer's time to clean it up?

Sometimes I wonder if they employ uploaders paid by the number of files uploaded. We seem to end up with books added page by page in duplicate copy from what should be a single djvu document. Enhancing999 (talk) 18:20, 24 July 2024 (UTC)Reply

"Any edit on the version you consider secondary is as waste of curation energy." - Yes obviously and I fully agree on that. Thats why there are categories like Category:LC TIF images with categorized JPGs - these images have JPG versions which are being maintained, and the TIF files in this category are uncategorized besides being in that category/referenced from the JPG file description page. This means that there is no need to ever do any edits on TIF files in that category, as only the "categorized JPGs" are maintained.

(Topic Paid by files uploaded: I don't think so, this is simply the format those scans are stored, and with proper software to handle this, there isn't anything wrong - but Commons dosen't have the software, and i fully support efforts to convert single page uploads of books into PDF. The core problem is that Commons software is not designed to handle the same media in multiple file formats, like e.g. the Internet Archive which offers texts for download in countless different file formats from the same description page (random example.)) TheImaCow (talk) 20:03, 24 July 2024 (UTC)Reply

These seems like an over optimization. Things work, they are a bit slower because someone thought having a very high resolution in a very badly compressible format was desirable. The side effect of very large originals is that it takes a while before the thumbnail is ready. But for 99.9% of the people that isn't a problem. Thubmnails are cached. If images are used, you therefor never have to wait for the thumbnail. You are in the .1% of people (Curators) looking at things that are NOT used. It's acceptable to wait a second in that case. —TheDJ (talk • contribs) 13:47, 24 July 2024 (UTC)Reply

"someone thought it was desireable" - The Library of Congress used an archival format to archive the images, but we are not the LOC, and have different goals - so we should use file formats better suited for our uses.

The motto of this site is "freely usable media" - this also includes not having to have very fast internet connections or special programs to process or even fully view the file (at least edge/firefox cannot show raw .tif files in the browser, e.g. when trying to zoom in further), and this list could be expanded endlessly. We shouldn't forget our actual end users, who in general have much less knowledge about dealing with such file types, or generally anything.

Another issue not yet mentioned is that TIFF files are not indexed by Google Image Search and presumably other search engines, which is bad for obvious reasons. (search for site:commons.wikimedia.org carol highsmith, and there are only a couple hundred images which have been manually converted to JPG, but not a single of the 30,000 TIFs, appending filetype:TIFF dosen't return anything at all) TheImaCow (talk) 17:02, 24 July 2024 (UTC)Reply

If Google is broken than we don't want users at Commons having to fix it. Enhancing999 (talk) 18:22, 24 July 2024 (UTC)Reply

This is fully intentional, as demonstrated plenty of times, TIFF is a format simply not suitable for general web use. TheImaCow (talk) 20:05, 24 July 2024 (UTC)Reply

Weirdly, it appears to remove jpgs, see #c-Enhancing999-20240701184400-TheImaCow-20240630215900. Enhancing999 (talk) 10:38, 25 July 2024 (UTC)Reply

Comment Yes, on Commons, we need JPEG. If the source may not be available in the long term, we should also upload the original TIFF versions, but that not the case here. Be sure to link both versions. That was not done for some other files uploaded from LOC or NARA, and it is now a mess. Yann (talk) 18:09, 24 July 2024 (UTC)Reply

@TheImaCow, Enhancing999, Adam Cuerden, Jeff G., TheDJ, and Yann: I've started a discussion at Commons:Village pump#Should we convert all TIFFs to JPEGs?. I hope this would be the right place to garner a wider consensus on this -- DaxServer (talk) 16:46, 3 August 2024 (UTC)Reply

Cities in Finland and China by month

Latest comment: 1 month ago2 comments2 people in discussion

Hello! I would like to ask you to automatically create categories for the distribution of cities in Finland and China by month. There are corresponding templates: {{MonthinFinlandbycity}} and {{MonthinChinabycity}}. MasterRus21thCentury (talk) 07:22, 17 July 2024 (UTC)Reply

Hi @MasterRus21thCentury Could you explain it a bit more? Thanks! -- DaxServer (talk) 21:26, 2 August 2024 (UTC)Reply

Remove extraneous "I, " in author param of PD-self

Latest comment: 1 month ago10 comments5 people in discussion

@Pelikana noticed that there are a lot of erroneous uses of {{PD-self}} which insert "I, " before the author. Could someone please replace {{PD-self|author=I, with {{PD-self|author= in the following pages? —CalendulaAsteraceae (talk • contribs) 06:52, 24 July 2024 (UTC)Reply

I can do that -- DaxServer (talk) 09:46, 24 July 2024 (UTC)Reply

@CalendulaAsteraceae and @Pelikana: "I, " is there to make the assertion first person. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 10:21, 24 July 2024 (UTC)Reply

Hi @Jeff Yes, it obviously used to be there to make the assertion first person. But I think at some point the textlines were changed and now IMHO it is a displaced element, plus very odd that it is the only not translated text element in the template, at least in these use cases. Do you mean to say the results are completely correct this way and need no change? Both lines seem grammatically faulty to me "... door de auteur, I, JohnDoe" (".. by the author, I, JohnDoe") and "I, JohnDoe allows ...". Last one should read (in Dutch) "Ik, JohnDoe sta ...." It should not read "I, JohnDoe staat ... " because this line starts in first person and ends in 3rd person. In later days (past 2007-2008) the "I, " "I, " is not in the templates anymore it seems. Peli (talk) 10:52, 24 July 2024 (UTC)Reply

Indeed. The template uses {{int:Wm-license-pd-author-with-author-text}}, which produces the text "This work has been released into the public domain by its author, $1. This applies worldwide." The appropriate way to make this first person would to edit the page on TranslateWiki (well, the English one needs to be changed in MW code, but for other languages this is where you'd edit it), not to manually put "I, " in the author parameter. —CalendulaAsteraceae (talk • contribs) 20:50, 24 July 2024 (UTC)Reply

I think it is a good idea to add "I, " as a suffix if the uploader is also the work's creator. Please don't replace that. For example, it may not be clear to many or people only or first check the author field where this is useful metadata, especially if the author name is different from the username in which case they would also need to check the license template. Prototyperspective (talk) 12:04, 25 July 2024 (UTC)Reply

This is a good thing to handle in {{PD-self}} (which is a template only intended to be used by the uploader). Adding it manually means it's a huge pain to update if the wording of the template changes, and also doesn't work with internationalization. Right now

{{int:Wm-license-pd-author-with-author-text|I, Calendula}}

produces

This work has been released into the public domain by its author, I, Calendula. This applies worldwide.

in English, which is ungrammatical and frankly silly. If I switch my display language to Spanish, it instead produces

Este trabajo ha sido liberado al dominio público por su autor, I, Calendula. Esto aplica para todo el mundo.

which is even worse. If you want to change the wording of {{PD-self}}, probably the way to go is switching in the template from int:Wm-license-pd-author-with-author-text to something like int:Wm-license-pd-author-self-text that incorporates the author's name. —CalendulaAsteraceae (talk • contribs) 19:28, 25 July 2024 (UTC)Reply

I couldn't find an existing piece of text, so I submitted a feature request at phabricator:T371057. I think that further discussion of updates to the text of {{PD-self}} should go to the template talk page, and also that this bot request should go ahead because manually adding "I, " before the author's name is a terrible way to make the template first-person. —CalendulaAsteraceae (talk • contribs) 20:28, 25 July 2024 (UTC)Reply

You're absolutely right. Sorry, I misunderstood. It's not really clear in your initial post that this would be added to the template instead. Prototyperspective (talk) 21:02, 25 July 2024 (UTC)Reply

I agree that this is just about cleaning up a tiny bit of lost and redundant text on a limited number of pages and would be glad if @-- DaxServer would get the green light to fix this series of typo's, on these old pages by deleting "I, ". ThanksPeli (talk) 21:44, 4 August 2024 (UTC)Reply

Images with borders (MTC)

Latest comment: 1 month ago4 comments2 people in discussion

Many images in Category:Independence Day 2019 in Brasília have a border. Sample: File:Comemoração da Independência do Brasil (48700486098).jpg

These should be added to Category:Images with borders. Possibly the same applies to more in from the same MTC Flickr stream. Enhancing999 (talk) 11:18, 27 July 2024 (UTC)Reply

Assuming the border always has the mark to the website www.mctic.gov.br website at left bottom, here's what I thought of: Load the image with OpenCV and extract the left bottom part, use Tesseract to do OCR for the website text, do a sequence match with the extracted text and the website string and if the comparision is very high enough that can be categorized.

Here is a sample code: https://www.kaggle.com/code/daxserver/detecting-borders-from-brazil-mtc-flickr-images/ -- DaxServer (talk) 17:25, 27 July 2024 (UTC)Reply

I did some screening on Category:Independence Day 2019 in Brasília by changing the background color of the page. It appears that there are a few images without a border. The ones I checked were all from other Brazilian government agencies. Sample: File:07 09 2019 - Desfile 7 de setembro. (50751888331).jpg.

The magic border locator of the crop tool does work fairly reliably on these images. Sample: https://croptool.toolforge.org/?site=undefined&title=Comemora%C3%A7%C3%A3o%20da%20Independ%C3%AAncia%20do%20Brasil%20(48700486098).jpg&page=undefined

The only problem with directly cropping them seems to be that the file description pages don't include all details from the borders. Enhancing999 (talk) 10:53, 29 July 2024 (UTC)Reply

The magic borders module is interesting. Perhaps we can employ that to detect a border. I'll do some tests -- DaxServer (talk) 13:57, 29 July 2024 (UTC)Reply

Missing "-" in coordinates (MX)

Latest comment: 1 month ago2 comments2 people in discussion

Some of the Mexico images by a former contributor show locations in Asia [7]. This seems to be due to a missing "-" in the coordinates. Sample fix: Special:Diff/905615293. I fixed a few myself. Enhancing999 (talk) 10:18, 1 August 2024 (UTC)Reply

Done ~74 edits, I did it with pwb replace using my normal account -- DaxServer (talk) 12:08, 1 August 2024 (UTC)Reply

Add OCR output to jpg

Latest comment: 1 month ago3 comments2 people in discussion

From the discussion at VP/T, I found a solution to a problem identified earlier: frequently we have images of streets and other with some text in it. Sometimes this is of interest, but it's not necessarily included in filename or description.

https://ocr.wmcloud.org/ would allow to extract such text and make it editable on Commons.

Ideally a bot would go through new uploads (and also some maintenance category for older files) and run https://ocr.wmcloud.org/ on it. The output (if any) could be added to the file description page, either with a template or as structured data.

Sample file:

File:Plaque Rue Lauriers - Gournay-sur-Marne (FR93) - 2021-10-04 - 1.jpg

Input:

https://ocr.wmcloud.org/?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F3%2F36%2FPlaque_Rue_Lauriers_-_Gournay-sur-Marne_%2528FR93%2529_-_2021-10-04_-_1.jpg&engine=google&psm=3&line_id=null

Output:

"PER PONTEM AD FORTUNAM GOURNAY-SUR-MARNE RUE DES LAURIERS"

Enhancing999 (talk) 15:16, 2 August 2024 (UTC)Reply

Also please see the discussion at VP/T linked above. Just briefly adding support to this wish and two notes:
it would likely be a problem to scan all files on WMC and/or all new uploads, instead one could let the bot run only categories where this may be useful. Secondly, rather than writing a new bot it would probably be better to add this functionality to some bot that already writes e.g. structured data to lots of files (however SD can't be searched on WMC can it?) like SchlurcherBot. Prototyperspective (talk) 10:50, 4 August 2024 (UTC)Reply

I think there are already some bots who scan all uploads .. it could obviously be added to those. If SD is used, we should make sure it's searchable. Enhancing999 (talk) 13:47, 4 August 2024 (UTC)Reply

Move "Historical images of" to "History of"

Latest comment: 26 days ago9 comments4 people in discussion

Per note at Category:Historical images by country (as conclusion from Commons:Categories for discussion/2019/09/Category:Historical images), the content of the categories at Special:PrefixIndex/Category:Historical images of should be moved to "History of". This seems to involve more than 10'000 categories, see PetScan:29034509. I think the resulting redirect could afterwards be tagged for speedy deletion. Enhancing999 (talk) 18:59, 2 August 2024 (UTC)Reply

i dont think it's a good idea to handle this problem without human supervision.

i would rather do these instead:

prohibit new categories with the word from being created.
let users slowly move the files to the appropriate categories (by time).

RZuo (talk) 20:42, 2 August 2024 (UTC)Reply

"history of ..." is not any better. everything is history. RZuo (talk) 20:43, 2 August 2024 (UTC)Reply

Right, any cutoff for "history" will change every second/minute/hour/week/month/year/century/millennium. See also Commons:Categories for discussion/2024/08/Category:History by country. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 10:20, 4 August 2024 (UTC)Reply

there's specific interest related to "history" of something.

for example, historians of asian history should go under "history of asia".

but to dump files into "history of xx" is no more better than dumping them in "xx" or "historical images of xx". all files of xx can perfectly fit into all those three variations.

most of these "historical images of xx" basically contain all photographs before the advent of digital photography, especially black and white photographs.

so i'd rather users move these cats to or create for example "xx in the 19th/20th century". RZuo (talk) 12:49, 4 August 2024 (UTC)Reply

i have an idea of a bot moving files according to the time/date, but i need probably 1 or 2 years to code something like that up. RZuo (talk) 12:53, 4 August 2024 (UTC)Reply

I don't think this is the place to re-discuss the CfD. If you think the closure is problematic, ask an admin to re-open it. Enhancing999 (talk) 12:57, 4 August 2024 (UTC)Reply

There is just no way this can be done manually. If there are cases you think would be problematic, please state them here. Enhancing999 (talk) 20:56, 2 August 2024 (UTC)Reply

Support Per Enhancing999. There's currently 33732 categories for "historical images", which is way to many for anyone to deal with manually. This also isn't the place to relitigate the CfD. Nor do I think doing so would go anywhere anyways since it was open for 4 years and has been closed since last year. So there has been plenty of time for people to raise concerns about it. Most of these categories only contain a couple of images to begin with and they aren't "historical" either. The idea that we should let users slowly move the files to the appropriate categories when it's only a couple of images per category to begin with is totally ridiculous and would just waste everyone's time. There's no reason people can't better categorize the images once they are moved to "history of xx" categories. That's where most of the images were in the first place. Regardless, this should totally be done by a bot instead of forcing users to waste time doing it manually. --Adamant1 (talk) 07:27, 10 August 2024 (UTC)Reply

P625 to P1259

Latest comment: 23 days ago5 comments3 people in discussion

i just made mistakes of adding coordinate location (P625) to files. is there a bot already running that automatically migrates p625 statements to ~~location of creation (P1071)~~ coordinates of the point of view (P1259)? RZuo (talk) 10:37, 6 August 2024 (UTC)Reply

made a mistake again. it seems p1071 is meant for wd items of actual places, not coords.

Commons:Structured data/Modeling/Location, important info.--RZuo (talk) 11:29, 6 August 2024 (UTC)Reply

OpenRefine can do that. I can help if you'd like. -- DaxServer (talk) 10:58, 6 August 2024 (UTC)Reply

manual is easy, but this should be a continuous bot job.--RZuo (talk) 11:29, 6 August 2024 (UTC)Reply

See Module_talk:Coordinates#Usage_of_P625?. Multichill (talk) 17:21, 13 August 2024 (UTC)Reply

Media missing infobox template

Latest comment: 1 day ago22 comments4 people in discussion

There are about 339,000 files in the category Media missing infobox template. Even using add_information.php (or the gadget), the task is too huge to be done manually. I assume that would be a nice job for a bot. A simple search/replace wouldn't be sufficient, since some file pges contain {{Filedesc}} and {{License-header}} which should be preserved. Additionally, some files have information on sources, e.g. 1884 South Penn RR.jpg. Those should be used for the source parameter of the information template. Fl.schmitt (talk) 19:22, 10 August 2024 (UTC)Reply

Maybe a list could be generate from the category about the most used files and these be done manually? Also, please keep in mind COM:GOF. Enhancing999 (talk) 07:09, 12 August 2024 (UTC)Reply

Good idea - restricting on the most used files is reasonable. Additionally, i thought about grouping by uploader / author which would facilitate automatic editing. Fl.schmitt (talk) 07:25, 12 August 2024 (UTC)Reply

I tried Special:Search/switzerland incategory:"Media_missing_infobox_template" and then used Petscan:29082230 to find the uploaders.

This found images like File:Runs_Kapelle.jpeg by "Ikiwaner" who uploaded plenty of own pictures which is clearly indicated, but even add-information can't complete it.

One would think that we'd have more pictures of these places almost 20 years later, but sometimes we don't. Enhancing999 (talk) 08:30, 12 August 2024 (UTC)Reply

Looks very interesting! The problem with add-information.php is that it has to transform arbitrary input, which is IMO almost impossible. With pre-structured data (known author/uploader, known structure of file description), maybe the task can be automated to a certain extend. Limiting the input by location is a good idea! Fl.schmitt (talk) 08:55, 12 August 2024 (UTC)Reply

add-information.php seems relatively good based on the input, but a review seems necessary.

Even when filtering by uploader can give large range of complicated cases (especially old imports from other wikis). Adding a search for "own photograph" (or similar) can simplify things. Enhancing999 (talk) 12:02, 12 August 2024 (UTC)Reply

Maybe we could list groups of a similar cases somewhere, so someone else can determine if they want to assess them further (or they are all actually similar). Samples:

Mapmakers
- Tschubby maps (1100), see Category:Media missing infobox template (maps t1)
- AHoerstemeier map (275)
- Vardion map (288)
Photographers/image sources
- Famfamfam flag icons (250)
- Peter Berger (50)
- Picswiss (294)
- NASA (4794)
- Lienhard Schulz (270)
- Dake Switzerland (82)
- Carlo Ponti (31)
- Giorgio Sommer (75)
- Arnaud Gaillard (105)
- Qwertzy2 (81)
- CdaMVvWgS (65)
- Marcel.C (7)
- Flyout (46)
- Francis Frith (28)
- Matthäus Merian (160)
- Marc Mongenet (68)
- Markus Bernet (40)
- Julo (524): possibly several
- Flickr (4483)
- Crops from Mathematikerkongress, Zürich 1932 (all done)
Copyright status
- PD-Old (22630): Category:PD Old
- PD Official documents (115)
Added complexity
- original upload log-header (10450): Template:original upload log (initial upload at Wikipedia)
- transferred from (458): Template:Transferred from (not included in previous)
- Files_moved_to_Commons_from_Wikipedia: Category:Files_moved_to_Commons_from_Wikipedia (possibly included in previous, likely not)
- derived from (91): Template:derived from
- extracted from (1641): Template:Extracted from
- Bilderwerkstatt (387) :Template:Bilderwerkstatt
- images with annotations (1765): Category:Images with annotations (doesn't work well with add_information)
- Files in need of review (sources) (4705): Category:Files in need of review (sources) (possibly this doesn't go beyond the file not having the information template)
Personal templates, see Category:Media missing infobox template (personal templates)
- RHaworth personal template (25): user:RHaworth/mylic
- IUCN category/Pengo (362): Template:Pengo IUCN
- Fb78 (122): User:Fb78/Licence
- Twice25 (270): user:Twice25/Crediti
- CNG (343): Template:CNG
Image types
- coat of arms (16323), some in Category:Media missing infobox template (coats of arms)
- insignia (11356)
- flag (13852)
- currency (423), see Category:Media missing infobox template (currency)
- logo (7826)
- Wikipedia brand (957): Category:Trademarks and logos of Wikimedia (also included in previous)
- kit body (769)
- ChemDraw (490)

Enhancing999 (talk) 15:01, 12 August 2024 (UTC) updatedReply

We could do subcategories of Media missing infobox template for maps, logos, coats of arms, insignia, currency, flags and personal templates. There is already one for artwork.

Interesting to compare the early digital photos with others we have: sometimes it still looks the same, others lack any comparable one, sometimes it's clearly aged, sometimes it gives a historic comparison, sometimes in a larger set we lack clearly better ones.

BTW, image notes seem to be handled badly by add-information (they get mixed into the description). Headers handling could be improved too. I don't think I ever had one that didn't need editing (that seems to be the idea anyways).. besides, I try to complete them. Enhancing999 (talk) 13:38, 13 August 2024 (UTC)Reply

@Enhancing999: Great work, this is very helpful. I've started with the maps provided by Tschubby, because it seems that most of the file description shares the same structure. Please check Revision #909185535 of Karte Gemeinde Troinex.png for a regex-based replacement by pywikibot. IMO, this looks ok. Fl.schmitt (talk) 16:49, 13 August 2024 (UTC)Reply

The problem with File:Karte Gemeinde Troinex.png is that it wasn't uploaded by Tschubby, so {{Own}} isn't applicable.

Supposedly that file and File:Carte Commune Troinex.png are based on a file that was initially uploaded at de:File:Karte Gemeinde Troinex.png, see https://de.wikipedia.org/w/index.php?title=Spezial:Logbuch&logid=283755 . Normally the file description page would include copy of the upload log from dewiki, but it doesn't. File:Glacier.zermatt.arp.750pix.jpg had some details I added after "own".

BTW, Tschubby is still very active, so he might have a view how he prefers them to be done or do them directly himself. Enhancing999 (talk) 17:03, 13 August 2024 (UTC)Reply

If it's the same file, initial upload was: [8]. Enhancing999 (talk) 17:09, 13 August 2024 (UTC)Reply

hmm - ok - yes, seems I was too optimistic... it's clear that getting this done by a bot will never reach the quality of manually checking / editing all the parameters. So we will have to decide which grade of completeness is achievable / required. Searching for other / derived / source versions can only be done manually, I think. So if this is a requirement, there's no way to get this task done by a bot, not even a small part of this task.

What's possible IMO is to group the files by the structure of their description, maybe additionally by uploader and year/month of upload, and do a regex-based replacement. This may lead to incomplete Information/Map/Artwork templates, e.g. if there's no information regarding the source.

Regarding the parameters:

Setting the source parameter may be possible (1) if the source is stated in the description or (2) if uploader is identical with author. In other cases, the source can't be set automatically.
Setting the exact upload date will be very difficult if we use pywikibot's replace script. If using the upload's year and month is sufficient, one could group the files accordingly, based on a PetScan search. This depends on the required/acceptable grade of precision.

Fl.schmitt (talk) 18:13, 13 August 2024 (UTC)Reply

Still trying to get the {{Upload date}} template working... Fl.schmitt (talk) 16:51, 13 August 2024 (UTC)Reply

I try to avoid upload date. Weirdly, add-information tends to get even the exif date wrong. For Tschubby's municipality maps, it may be sufficient to add the year they are meant to be current (borders don't change that frequently). Enhancing999 (talk) 15:00, 15 August 2024 (UTC)Reply

when you can identify some common pattern in some file sets, Commons:AWB or jwb might be a good tool. RZuo (talk) 22:04, 14 August 2024 (UTC)Reply

What would be cool for add-information is if one could use it with some defaults (description language, author, date, {{Taken on}}-location, source, other fields, license, etc) for a given subset.

Also, a few bugs might be worth fixing (licence header formatting, keeping image annotations together, placement of coordinates template, exif dates) if others plan to use it (I'm mostly done with the subset I'm looking into). Enhancing999 (talk) 15:05, 15 August 2024 (UTC)Reply

Enhancing999, Thank you for tackling this long neglected problem. I like your divide-and-concur approach, and I agree with RZuo that Commons:AWB might be a good tool to use. That is what I used when some years ago I was adding infoboxes. Another possible approach might be to start adding com:SDC data like author, description and date with QuickStatements tool. If you do that than you can just add {{Information}} template with no parameters and it will display SDC data. See File:Indoor_Climbing_Kid.jpg for example. If you have any questions about this approach I can explain with more details. --Jarekt (talk) 04:21, 16 August 2024 (UTC)Reply

Good idea indeed. This could simplify adding only one aspect at a time (not everything can be determined with the same ease). ~~Once sufficient data for {{Information}} is available, the template could be added.~~ We just need to be careful that basic information available as statements is also otherwise visible.

BTW, one would think that it's an old issue, but sometimes even recent uploads don't have a template (or someone deleted it).

If it's thought helpful for others, I can create subcategories for some or most of the above groups (obviously they should be deleted easily once empty or if a better one can be found).

If it's easy to add by bot, a subcategory for frequently used files could be helpful. (it's doable with PetScan for a relatively small set, but not for all 337000 files in the category). In the subset I checked few had more than 30 main namespace uses (sample, now with template). Enhancing999 (talk) 11:21, 16 August 2024 (UTC)Reply

Flickr might a good start to add {{Information}} through statements only. We currently have ca. 4500 files mentioning Flickr. Some 2100 have both creator and source. An issue with some of these seems to be that they are blank. I brought this up at Schlurcherbot. Wouldn't the various Flickr templates also include source and creator? Enhancing999 (talk) 15:01, 16 August 2024 (UTC)Reply

@Enhancing999 thank you for creating the hiddencat - this makes it easier to get a clearly defined set of files as input for bulk modifications! I'm currently working on a bot that should be able to work through the grouped files, preferably writing SDC data wherever possible. But there are some points where I'm not sure about:

Date: We may simply take the year (as you've proposed earlier), but I found it would be quite easy for a bot (from a technical point of view) to use the oldest upload date. Is there a way to use inception (P571), qualifying the date as {{Upload date}} in SDC?
Source: We can use either original creation by uploader (Q66458942) (if uploader and creator seem to be identical) or own work by the original uploader (Q87402110) (in other cases). I wonder if there's a way to additionally point to the source wikipedia (e.g. german wikipedia)?

Fl.schmitt (talk) 16:11, 22 August 2024 (UTC)Reply

Commons_talk:Structured_data might find you help on the question specifically for structured data.

If {{Information}} has no date, the line doesn't even appear as missing. Sample: Special:Diff/914493166.

I noticed some uploaders use {{Own}} and link directly their username at Wikipedia. Not sure how bots handle this.

Reimports from Wikipedia are tricky in general. See also: Commons:Village_pump#c-Jarekt-20240817151300-Asclepias-20240817140600 Enhancing999 (talk) 19:29, 22 August 2024 (UTC)Reply

@Enhancing999 - it took some time, but my bot solution is almost ready for action. Since handling weak-structured data is tricky, the bot first prepares (and actually prepared) just a "simulation" result, without any "live" modifications of Commons pages. This "simulation" result shows the proposed modifications for a certain set of file pages lacking {{Information}}. The bot tries to add as much information as possible by SDC (esp. Date and Author) and doesn't repeat those values in the generated {{Information}} template, since the template uses those SDC values by default. So, the template may look "incomplete" (for reference, see e.g. File:Karte Bodensee Birnau.png where I added as much as SDC as possible manually, leaving the respecting files in the Information template empty). The simulation result is available on gitlab in two formats: plain txt and SQL (sqlite). Before filing a bot request, I would be glad about any critical feedback regarding the proposed modifications. Fl.schmitt (talk) 18:31, 4 September 2024 (UTC)Reply

Auto-addition of inferrable categories

Latest comment: 6 days ago2 comments1 person in discussion

Could somebody create a bot to add categories that are inferrable given the structured data (SD) that other bots have added or the combination of the file's existing categories?

I previously asked if this functionality could be added to User:SchlurcherBot here but it seems like it won't be done. Schlurcher (talk · contribs) said there already are some bots adding categories – maybe instead of creating new bot(s) for this it would be best to add this functionality to these (please name these because I don't know what the relevant bots would be).

SchlurcherBot for example already reads and parses the date field, it would be nice if a bot did the same and then added "Category:Videos of 2023" if that's in the field which the bot already writes into the "inception" field. It could also be put in a hidden subcat like "Uncategorized videos of 2023" so people can check these and/or it doesn't clutter the category which could also be specific to videos that show something specific to the year like an event (this doesn't seem to be the case currently).

Likewise, the bot already writes the display resolution to the SD but does not add the respective Category:Videos by display resolution subcat. If not nearly all videos are in there I don't see why this category (its subcats) could be useful. If it was added to videos, then one could use this for statistics, petscans and maybe other things. The same goes for the WebM videos category which is currently up for deletion. Most webm videos are missing there so the category is largely useless. (Note that these two are exceptional cases: most WMC categories are useful.) If files were in there one could for example use this as a workaround to find videos in petscan which currently can't filter for videos except when combining categories with the Category:Videos by file format cat. A bot could also populate the Category:4K videos.

Some further examples of inferrable cats:

"Videos of 2024 from the United States" (depending on the license tag or other categories of the file or the coordinates)
Category:Audio files of 1906 (see here)
when a video file is in a category like "Muscidae" it should be added to Category:Videos of Muscidae
when a video file is in category "Azeliinae", which is a subfamily of Muscidae, it should also be added to "Videos of Muscidae" because that is the closest category to Muscidae which has a subcategory for videos (it would be best if a bot moved it to the more specific subcat once it is created)
files in Category:Animals in water should be moved to Category:Elephants in water if it's also in a subcat of elephants
it could make sure artworks like paintings (e.g. somewhere in a subcat of Visual arts) are in an "in art" subcat so paintings of elephants eating are not directly in the Elephantidae eating category and can e.g. be easily filtered out using the deepcategory search operator (or viewed alongside other such artworks)
a video in a subcategory of Category:Black and white films should go into Category:Black and white videos
After some delay to allow removal of metadata, categories like Category:Taken with Canon PowerShot A480 (example) or Category:Photographs by exposure time‎ subcats based on the file exif metadata

More difficult:

Videos without audio, Black and white videos, Category:Animated GIF files (for gifs that are animated e.g. from -deepcategory:"Animated GIF files" sorted by recency) if it can read the content to some degree like being able to check if the video has audio (the ones below are probably much easier to implement)
Category:Categories with contradictory categorization & Category:Categories with categorization contradicting their contents – inferrable cats may also be applied to categories and this maintenance cat could be added to cats that have contradictory categorization which can often be solved with a new subcat

More things could be added and refined to such an automatic categorization system(s) over time. There can be rare exceptions but having things auto-categorized with exceptional errors would be better than things missing and requiring lots of manual maintenance/subcategorization and there would be ways to deal with that (for example for video files in Category:Short films it would create a 'suggestion' to add Category:Short films videos) and move things out of ill-inferred categories where usually another cat of the file is false.

Here I suggested that video2commons adds as many of inferrable cats right away when uploading, such as "Videos with English subtitles" if it imports en subtitles.

It would be a big endeavor but addition of categories that are inferrable from other categories of a file would be very useful, very much improve the reliability / completeness / usefulness of categories, and free up categorizers' time for other tasks.

Maybe focusing this discussion on inferrable technical criteria cats for videos would be best for now. Prototyperspective (talk) 23:05, 17 August 2024 (UTC)Reply

I'll rarely add some more inferrable categories to the list. Added 2 and edited 1 now. --Prototyperspective (talk) 12:50, 30 August 2024 (UTC)Reply

Generate a daily database report equivalent of Special:UncategorizedCategories

Latest comment: 10 days ago16 comments4 people in discussion

Generate a daily database report equivalent of Special:UncategorizedCategories

For each page, output:

Page name
Date time last edited
Creator name
User who last edited
Wikidata item (if available)

Ideally formatted in a template. Enhancing999 (talk) 14:27, 24 August 2024 (UTC)Reply

Is there a way to find categories modified / created in the last 24 hours? Fl.schmitt (talk) 17:53, 24 August 2024 (UTC)Reply

https://commons.wikimedia.org/w/index.php?sort=last_edit_desc&search=contentmodel%3Awikitext&ns14=1

https://commons.wikimedia.org/w/index.php?sort=create_timestamp_desc&search=contentmodel%3Awikitext&ns14=1 . RZuo (talk) 19:34, 24 August 2024 (UTC)Reply

Thanks! But it's difficult to build a stable PagePile from a query... - here's a Quarry #85732 searching for Categories created today and a Petscan (psid: 29146248) based on that quarry (same for yesterday: Quarry #85736 / Petscan (psid: 29146272)). Would be nice if pywikibot could "update" the quarry automatically, since the pagepile is only a "snapshot" of the quarry result at the time the pagepile was created. Or the bot would have to query a replica db itself... Fl.schmitt (talk) 21:27, 24 August 2024 (UTC)Reply

Something like Quarry:history/85731/922919/895636 should work (1187 rows today, ran in 27 minutes). I'm trying to add the Wikidata item. User names seem more difficult, but I guess we can do without. Enhancing999 (talk) 21:34, 24 August 2024 (UTC)Reply

Username did work too Quarry:history/85733/922940/895656 (1167 rows, ran in 60 minutes). Username seems interesting too. Enhancing999 (talk) 21:47, 24 August 2024 (UTC)Reply

with Wikidata item Quarry:history/85731/923002/895717 (1162 rows, ran in 27 minutes; 109 have items). Maybe I can combine the two. Enhancing999 (talk) 21:50, 24 August 2024 (UTC)Reply

An initial output for now at Commons:Report_Special:UncategorizedCategories. Enhancing999 (talk) 22:48, 24 August 2024 (UTC)Reply

@Enhancing999: Excellent work! Is there a way to create such a report automatically based on a Quarry? Fl.schmitt (talk) 10:01, 25 August 2024 (UTC)Reply

That's the point of my bot request: the idea is that the bot runs the report daily and add its result to the page. I'm still working on the optimal format.

(BTW, as another bot adds infoboxes to categories with items, there shouldn't be hardly any on the list. I reported this to User talk:Mike Peel.) Enhancing999 (talk) 10:08, 25 August 2024 (UTC)Reply

Here is the combined query: Quarry:history/85733/923138/895851. It seems it runs quicker now that we worked to limit the number of categories on it. Enhancing999 (talk) 13:14, 25 August 2024 (UTC)Reply

@Enhancing999: I've found thatt with Pywikibot, it's quite easy to get the metadata (last time edited, last user editing, wikidata item and so on) for any page or category. Here's a small python script that collects the metadata and creates a CSV file as result. CSV output is only for testing purposes - with some more effort, the script could output a wikitable (or even a static HTML page). The difficult part is getting a PagePile containing the uncategorized categories (doesn't even need to be a PagePile, but working on PagePiles is quite straightforward with Pywikibot). Fl.schmitt (talk) 06:01, 26 August 2024 (UTC)Reply

So it might be quicker to use a basic query and then enriched that afterwards. Enhancing999 (talk) 07:06, 26 August 2024 (UTC)Reply

Yes - while it's important to include the namespace as page_namespace and the category name as page_title (column aliases in SQL) - otherwise PagePile isn't able to create a valid result. Anyway, a bot would still require some time to work, since it needs to request all categories and their metadata. Depending on the server load, it seems that a bot is able to process about 6-7 Categories per Minute. So, processing ca 1,000 categories will still take between 2 and 3 hours (even when preloading the categories in batches). I'll have to check if there's a way to speed up things. Fl.schmitt (talk) 12:36, 26 August 2024 (UTC)Reply

Then it might be easier to do it all in SQL: Quarry:history/85733/923437/896130 took about 15 minutes, but speed on quarry varies much. Enhancing999 (talk) 12:49, 26 August 2024 (UTC)Reply

What a great query! Thanks for building this (also @Tjmj: ). Could you or somebody else fork it once more to also create a report for redcats (cats with only nonexisting categories)? Doesn't this query mean this is nearly solved? One would only need to get its result output and write it to some page on WMC and have a script do that regularly. Quite possibly there already is a tool for each of both. Prototyperspective (talk) 14:16, 26 August 2024 (UTC)Reply

file description cleanup: "Uploaded with Reworkhelper"

Latest comment: 9 days ago1 comment1 person in discussion

At File:Bahnhofshalle_Zuerich-2.jpg, when adding {{Information}}, I noticed:

Uploaded with ''[http://tools.wikimedia.de/~luxo/reworkhelper.html Reworkhelper] {{Wayback|url=http://tools.wikimedia.de/~luxo/reworkhelper.html |date=20080419030339 }}''

in the source. There are other similar: Special:Search/insource:"tools.wikimedia.de/~luxo/reworkhelper.html" (1970)

I'd remove this from the file description pages and replace it with a category, e.g. Category:Uploaded with Reworkhelper, similar to others at Category:Files by upload tool.

@Luxo FYI
∞∞ Enhancing999 (talk) 13:46, 27 August 2024 (UTC)Reply

Add P1651 YouTube video ID structured data from "source" attribute of Filedesc template

Latest comment: 1 day ago8 comments4 people in discussion

To assist with de-duplication of Common Criteria YouTube video uploads, a bot could search uses of Template:Filedesc for a "source" attribute that contains YouTube links in the format of "^https:\/\/www\.youtube\.com\/watch\?v=([-_0-9A-Za-z]{11})$". The captured 11 character YouTube video ID can then be added as structured data for the video file using Wikidata's P1651 property. Dhx1 (talk) 01:54, 29 August 2024 (UTC)Reply

Also see related request (approved) for adding P12120 Flickr photo IDs as structured data at Commons:Bots/Requests/FlickypediaBackfillrBot. Dhx1 (talk) 02:00, 29 August 2024 (UTC)Reply

I have exactly the same idea and wanted to propose something in a few days.

@Multichill Is your bot capable of adding YouTube data in SDC, and perhaps also additional data related to it? -- DaxServer (talk) 07:22, 29 August 2024 (UTC)Reply

@DaxServer: Focus is on own work so not really doing much identifier properties at the moment. Did propose one at d:Wikidata:Property proposal/VIRIN. Maybe make it a bit more generic bot to add identifiers based on templates?

For YouTube a bit of cleanup is needed. Probably switch to something like {{From YouTube}}, which can add a tracker category so a robot has a category to work on. Same goes for {{ID-USMil}} and other templates that have an identifier property. Multichill (talk) 10:59, 1 September 2024 (UTC)Reply

We currently have 166,884 files resulting from this search. That template doesn't mention categorization. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 15:28, 1 September 2024 (UTC)Reply

Jeff, the template mentions that it categorizes into Category:Media from YouTube. Are we talking about the same {{From YouTube}}? -- DaxServer (talk) 18:42, 1 September 2024 (UTC)Reply

@DaxServer: Thanks, I missed it at first. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 21:58, 4 September 2024 (UTC)Reply

@Multichill Yes, please make the bot generic so it can be expanded with ease 🙏 -- DaxServer (talk) 20:26, 1 September 2024 (UTC)Reply

Add missing Template:Location

Latest comment: 4 days ago7 comments3 people in discussion

Some images seem to have coordinates in SDC, but not display them on the file description page.

Special:Search/haswbstatement:P1259 -hastemplate:"Module:Coordinates" doesn't find them. Is there way to complete them?
∞∞ Enhancing999 (talk) 15:14, 29 August 2024 (UTC)Reply

@Enhancing999: completing the file description isn't too difficult once the files are found. Fl.schmitt (talk) 17:53, 1 September 2024 (UTC)Reply

I'm trying to figure out where I saw it. I think it was Flickrbackfillerbot who had added the coordinates. As they weren't visible on the description, I had added some myself and then the template complained about a differences.
∞∞ Enhancing999 (talk) 17:55, 1 September 2024 (UTC)Reply

@Enhancing999: BTW: If there are SDC coordinates, it seems to be sufficient to add empty {{Location}}/{{Object location}} templates - see test at Revision #918241588. Fl.schmitt (talk) 18:35, 1 September 2024 (UTC)Reply

Yes, the question is to which files.
∞∞ Enhancing999 (talk) 18:41, 1 September 2024 (UTC)Reply

One brute-force solution is to parse the database dumps - both the wikibase entities and wikitext content - and determine the files -- DaxServer (talk) 18:50, 1 September 2024 (UTC)Reply

Maybe we need to have MediaWiki re-add Category:Pages with coordinates to files.
∞∞ Enhancing999 (talk) 18:55, 1 September 2024 (UTC)Reply

Add topic