img2csv

Turns a set of table images to one CSV, powered by Tesseract.

Crazy CPU hog because of parallelization.

Installation

Put images to page.d, with ordered filenames.
If necessary, do pre-processings on the images to make them contain only tables.
Start the conversion: ./run.sh
The CSV will be stored in result.csv

Chops table into cells.
OCRs the cells with Tesseract.
Puts the OCRed cell texts together in their original positions, and generates the CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
page.d		page.d
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chop.py		chop.py
ocr.py		ocr.py
requirements.txt		requirements.txt
run.sh		run.sh