GitHub - diverged/USPTO-Bulk-Data-Tool: Configurable and concurrent processing tool for USPTO bulk patent zip files.

USPTO-Bulk-Data-Tool

A configurable tool for concurrent processing of U.S. Patent and Trademark Office (USPTO) bulk data zip files.

At this time, the tool supports the following USPTO bulk data products:

Patent Grant Full Text Data (No Images) (2004 - Present)
Patent Application Full Text Data (No Images) (2004 - Present)

Given a directory of USPTO zip files, the application will produce one of the following outputs:

Complete XML files of individual documents split out from the zip
JSON files of individual documents
- Selective (non-exhaustive) parsing of main document fields
- Structured patent claims representing referential relationships, as in the original PatentPublicData tool
- HTML formatting of Abstract and Description fields
Apache Parquet files corresponding to bulk zip files

Usage

Clone this repository. Edit the config.toml as needed - the most important config values are the first three:

[required]
inputdirectory = "data/in"
outputdirectory = "data/out"
outputmode = "json"

For the most basic setup, create data/in directories within the project root, and populate the /in directory with zip files to process.

Then, from the root of project directory:

make run

For more advanced usage running the application from somewhere other than the root of the project directory, the executable accepts a single optional argument specifying the path to a config.toml file.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cmd		cmd
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.toml		config.toml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USPTO-Bulk-Data-Tool

Usage

License

About

Releases

Packages

Languages

License

diverged/USPTO-Bulk-Data-Tool

Folders and files

Latest commit

History

Repository files navigation

USPTO-Bulk-Data-Tool

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages