[go: nahoru, domu]

Page MenuHomePhabricator

Implement EXIF extraction for webp files
Closed, ResolvedPublicFeature

Description

Webp files can contain EXIF information, but we currently do not extract this EXIF information to store it in the metadata column for the file

Example file: https://commons.wikimedia.org/wiki/File:Universiade_(大运)_Exit_A_F1_(2022-08-16).webp

This file contains EXIF information, but the this exif information is not extracted from the file and stored in the database.

https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=imageinfo&titles=File%3AUniversiade_(大运)_Exit_A_F1_(2022-08-16).webp&formatversion=2&iiprop=timestamp%7Cuser%7Cmetadata%7Cextmetadata
Shows that the information is not extracted.

This should be implemented in the webphandler class.
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/media/WebPHandler.php#99

The specification for webp riff is
https://developers.google.com/speed/webp/docs/riff_container

Related Objects

Event Timeline

TheDJ triaged this task as Low priority.Dec 23 2023, 2:54 PM
TheDJ updated the task description. (Show Details)
TheDJ changed the subtype of this task from "Task" to "Feature Request".
TheDJ moved this task from Backlog to Metadata parsing on the MediaWiki-File-management board.
TheDJ added a project: Commons.

Related upstream tickets:
https://github.com/libgd/libgd/issues/452
https://github.com/libexif/libexif/issues/58

Not looking promising. And similar problems for avif and heic/ heif.

Could try extracting the data chunk, prefix it with ‘"Exif\0\0"’ and then do ‘$exif = exif_read_data("data://image/jpeg;base64," . base64_encode($image));’

Honestly, this part of php is still stuck in in the php 5.2 age ;)

Could try extracting the data chunk, prefix it with ‘"Exif\0\0"’ and then do ‘$exif = exif_read_data("data://image/jpeg;base64," . base64_encode($image));’

I attempted this workaround, but I wasn’t able to make it work.

Change #1030525 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):

[mediawiki/core@master] Extract XMP & Exif from WebP files

https://gerrit.wikimedia.org/r/1030525

Change #1030525 merged by jenkins-bot:

[mediawiki/core@master] Extract XMP & Exif from WebP files

https://gerrit.wikimedia.org/r/1030525

Thanks brian. I was so close before, can't believe I didn't think of mixing the tiff parser with the stripping of the exif\0\0. Good idea to check the exiftool parser.

We should probably add a note to the release notes before closing the ticket.

TheDJ assigned this task to Bawolff.
exiftool -a -u -g1  /Users/djhartman/Downloads/Xili_Eco_Park.webp
---- ExifTool ----
ExifTool Version Number         : 12.76
---- System ----
File Name                       : Xili_Eco_Park.webp
Directory                       : /Users/djhartman/Downloads
File Size                       : 18 MB
File Modification Date/Time     : 2024:05:15 14:04:27+02:00
File Access Date/Time           : 2024:05:15 14:04:30+02:00
File Inode Change Date/Time     : 2024:05:15 14:04:27+02:00
File Permissions                : -rw-r--r--
---- File ----
File Type                       : Extended WEBP
File Type Extension             : webp
MIME Type                       : image/webp
Exif Byte Order                 : Big-endian (Motorola, MM)
---- RIFF ----
WebP Flags                      : XMP, EXIF
Image Width                     : 4096
Image Height                    : 3072
Image Width                     : 4096
Image Height                    : 3072
---- IFD0 ----
Y Resolution                    : 72
X Resolution                    : 72
Image Width                     : 0
Image Height                    : 0
Orientation                     : Horizontal (normal)
Resolution Unit                 : inches
Modify Date                     : 2023:11:25 20:54:21
---- ExifIFD ----
Color Space                     : Uncalibrated
Exif Image Width                : 4096
Exif Image Height               : 3072
Light Source                    : Unknown
---- IFD1 ----
Y Resolution                    : 72
Compression                     : JPEG (old-style)
X Resolution                    : 72
Resolution Unit                 : inches
---- XMP-x ----
XMP Toolkit                     : Adobe XMP Core 6.0-c003 116.ddc7bc4, 2021/08/17-13:18:37
---- XMP-xmp ----
Creator Tool                    : Adobe Photoshop 21.2 (Windows)
Create Date                     : 2023:09:30 21:25:39+08:00
Modify Date                     : 2023:11:25 20:54:21+08:00
Metadata Date                   : 2023:11:25 20:54:21+08:00
---- XMP-dc ----
Format                          : application/vnd.adobe.photoshop
---- XMP-photoshop ----
Color Mode                      : RGB
---- XMP-xmpMM ----
Instance ID                     : xmp.iid:5b650ae8-c3a2-7e43-9a51-142fa1b50708
Document ID                     : xmp.did:5b650ae8-c3a2-7e43-9a51-142fa1b50708
Original Document ID            : xmp.did:5b650ae8-c3a2-7e43-9a51-142fa1b50708
History Action                  : created
History Instance ID             : xmp.iid:5b650ae8-c3a2-7e43-9a51-142fa1b50708
History When                    : 2023:09:30 21:25:39+08:00
History Software Agent          : Adobe Photoshop 21.2 (Windows)
---- XMP-tiff ----
Orientation                     : Horizontal (normal)
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
---- XMP-exif ----
Color Space                     : Uncalibrated
Exif Image Width                : 4096
Exif Image Height               : 3072
---- Composite ----
Image Size                      : 4096x3072
Megapixels                      : 12.6
exiftool -a -u -g1  /Users/djhartman/Downloads/Changzhen_\(长圳\)_Outlook_\(2022-06-07\).webp
---- ExifTool ----
ExifTool Version Number         : 12.76
---- System ----
File Name                       : Changzhen_(长圳)_Outlook_(2022-06-07).webp
Directory                       : /Users/djhartman/Downloads
File Size                       : 2.7 MB
File Modification Date/Time     : 2024:05:15 14:04:44+02:00
File Access Date/Time           : 2024:05:15 14:04:45+02:00
File Inode Change Date/Time     : 2024:05:15 14:04:44+02:00
File Permissions                : -rw-r--r--
---- File ----
File Type                       : Extended WEBP
File Type Extension             : webp
MIME Type                       : image/webp
Exif Byte Order                 : Big-endian (Motorola, MM)
---- RIFF ----
WebP Flags                      : EXIF
Image Width                     : 5792
Image Height                    : 4344
VP8 Version                     : 0 (bicubic reconstruction, normal loop)
Image Width                     : 5792
Horizontal Scale                : 0
Image Height                    : 4344
Vertical Scale                  : 0
---- IFD0 ----
Image Width                     : 0
Image Height                    : 0
Orientation                     : Unknown (0)
---- ExifIFD ----
Light Source                    : Unknown
---- Composite ----
Image Size                      : 5792x4344
Megapixels                      : 25.2

So looks like IFD0 info gets preference by us, but that's actually broken in these files ?

I think IFD1 is for the embedded thumbnail (if present) and not the actual image. (ifd1 is irrelavent)

I guess we could have the riff tags override, thry would be source of truth.

Its a little weird we extract the technical data from exif at all.

I think IFD1 is for the embedded thumbnail (if present) and not the actual image. (ifd1 is irrelavent)

I guess we could have the riff tags override, thry would be source of truth.

Its a little weird we extract the technical data from exif at all.

Yeah. seems like something we can fix in the presentation layers if we want to/choose so. It's not like the extraction is actually failing, which was what i was initially suspecting when seeing this.

@TheDJ if its just that file being broken, I think it makes sense just to leave it the way it is, so I'm going to reclose this bug (but please reopen if you disagree)