pdf2pdfocr

A tool to OCR a PDF (or supported images) and add a text "layer" (a "pdf sandwich") in the original file making it a searchable PDF. The script uses only open source tools.

donations

This software is free, but if you like it, please donate to support new features.

Bitcoin (BTC) address: 173D1zQQyzvCCCek9b1SpDvh7JikBEdtRJ

Ethereum (ETH) address: 0x94a0e2e4eac8406e81806a152593e492824adb95

Litecoin (LTC) address: LT63cQRUZ8YgZZB5nVogEqQR91oUjHv9hN

Dogecoin (DOGE) address: DBNdvUptuZYMt7gb9HavCQovdsoxQzP6i6

Niobio Cash (NBR - http://niobio.money) address: N918uWiGba4ZcCBsc8nZrqhRaucjAZvhnMQ6WA7ubKoNhgNmWS1xn1pThP9HJG6rWqVEEWSPRkJff6dQjCEtbgtMP2Eudcr

installation

In Linux, installation is straightforward. Just install required packages and be happy. You can use "install_command" script to copy required files to "/usr/local/bin".

In macOS, you will need macports. Install macports, and run:

xcode-select --install
sudo xcodebuild -license
# install correct macports from https://www.macports.org/install.php
sudo port selfupdate
# install tesseract (Portuguese included - please setup for your preferred languages)
sudo port install git tesseract tesseract-por tesseract-osd tesseract-eng
# install python 3 and other dependencies
sudo port install python34 py34-pip poppler poppler-data ImageMagick wget 
# configure default python3 installer
sudo port select --set python3 python34
sudo port select --set pip pip34
# install libs (ignore warning messages)
sudo pip install reportlab
sudo pip install pypdf2
# install pdftk (may fail on newer macos)
sudo port install pdftk
# if fail, please install pdftk manually
# for versions <  macOS 10.11
  wget https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-mac_osx-10.6-setup.pkg
# for versions >= macOS 10.11 (http://stackoverflow.com/questions/32505951/pdftk-server-on-os-x-10-11)
  wget https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-mac_osx-10.11-setup.pkg

Note, wget and pdftk are optional. Macports version of pdftk won't build in macOS 10.11 and above. So you have to install it manually with above commands.

In Windows, you will need to manually install required software. Please read the document "Installing Windows tools for pdf2pdfocr" for a simple tutorial. It's also possible to use "Send To" menu using the "pdf2pdfocr.vbs" script.

docker

The Dockerfile can be used to build a docker image to run pdf2pdfocr inside a container. To build the image, please download all sourcers and run.

docker build -t leofcardoso/pdf2pdfocr:latest .

It's also possible to pull the docker image from docker hub.

docker pull leofcardoso/pdf2pdfocr

You can run the application with docker run.

docker run --rm -v "$(pwd):/home/docker" leofcardoso/pdf2pdfocr -v -i ./sample_file.pdf

basic usage

This will create a searchable (OCR) PDF file in the same dir of "input_file".

pdf2pdfocr.py -i <input_file>

In some cases, you will want to deal with option flags. Please use:

pdf2pdfocr.py --help

to view all the options.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.arm64v8		Dockerfile.arm64v8
Installing Windows tools for pdf2pdfocr.odt		Installing Windows tools for pdf2pdfocr.odt
LICENSE		LICENSE
README.md		README.md
docker-wrapper.sh		docker-wrapper.sh
install_command		install_command
pdf2pdfocr.py		pdf2pdfocr.py
pdf2pdfocr.sh		pdf2pdfocr.sh
pdf2pdfocr.vbs		pdf2pdfocr.vbs
pdf2pdfocr_multibackground.py		pdf2pdfocr_multibackground.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2pdfocr

donations

installation

docker

basic usage

About

Releases

Packages

Languages

License

lamlion/pdf2pdfocr

Folders and files

Latest commit

History

Repository files navigation

pdf2pdfocr

donations

installation

docker

basic usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages