[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty PDF issue with Puppeteer ^23.0.0 #876

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

JeppeKnockaert
Copy link
Contributor
@JeppeKnockaert JeppeKnockaert commented Aug 21, 2024

This pull request reintroduces the changes done for Puppeteer v23, while fixing the issue introduced in #870 resulting in a white PDF.

The issue was caused by not using base64 to communicate between puppeteer and Browsershot.

I also added a test that actually checks the contents of the PDF. To do this, I introduced the pdf-to-text package used in https://github.com/spatie/laravel-pdf and also added the same note in the README.

There is still one flaky test: 'can handle a permissions error with full output', which expects no errors, but sometimes gets a 404 on the favicon of example.com (because they have not favicon). It doesn't always fail, because the request for a favicon happens asynchronously.

@bluesheep100
Copy link

I discovered this as well last night, but it was quite late, so I didn't have time to PR it :(
Good work though. Are you sure there's no way to write this new test without introducing another outside dependency? (pdftotext)

I'm not personally sure why exactly this issue manifested as it did, but as far as I could tell, attempting to return the raw buffer from the JS side would mess up the encoding or something, because the broken PDFs are encoded as UTF-8, whereas the working one (as decoded from base64 by PHP) is ANSI.

@JeppeKnockaert
Copy link
Contributor Author
JeppeKnockaert commented Aug 22, 2024

Good point, I looked at it again and tried to see if I could just assert the binary data to contain the text. This works, but it also works with the empty PDF, resulting in a false positive. The thing that causes it to be a white page seems to be very subtle, because all the content is actually there in the binary data.

@bluesheep100
Copy link

I don't know much of anything about the PDF structure, but it appears they come out as blocks of stream data, prefixed with a length and "stream" keyword. I'd assume the difference in file encoding might affect that stream data and make it not display, while the binary remains identical.

@freekmurze
Copy link
Member

Very nice, thank you!

@freekmurze freekmurze merged commit 601f275 into spatie:main Aug 22, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants