[go: nahoru, domu]

Skip to content

Project ideas

Stefan Weil edited this page Apr 10, 2023 · 4 revisions

Ideas for further development of Tesseract

Conversion of neural networks for text recognition to and from Tesseract

As long as the networks are compatible with the features implemented in Tesseract, it should be possible to convert models made for Keras or Tensorflow to Tesseract and vice versa.

Maybe ONXX can be used as a common exchange format:

Related issues:

Support additional image formats

Tesseract uses Leptonica which can read many important image formats. Releant Leptonica API functions: pixRead, more?

Missing formats:

Extending Leptonica to support additional image formats is not desired because each format costs much resources for implementation and maintenance. But maybe it is possible to use an external library for image handling. Then only support for that library must be implemented.

Possible libraries: