Improvements to pdf parsing #364

nickscamara · 2024-07-04T01:28:53Z

Lots of people want to use to scrape pdf links which we already support by default but looking at some of the logs and comments on the docs we still run into issues. Would be good to have more reliable methods + more/better fallbacks.

I think the currently implementation only gets the text from the html but it doesn't convert to markdown. Would be nice to attempt a conversion.

nickscamara added the enhancement New feature or request label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to pdf parsing #364

Improvements to pdf parsing #364

Improvements to pdf parsing #364

Improvements to pdf parsing #364

Comments