MalDocA is a library to parse and extract features from Microsoft Office documents. It supports both OLE and OOXML documents.
The project's goal is to analyze potentially malicious documents to improve user safety and security.
Some testdata files contain malicious code! Hence, we use a xor-encoding for some testdata files as a safety measure (key = 0x42). Therefore, be very careful when opening / processing them!
For convenience, we provide a python script (“testdata_encode.py”) to encode / decode those files. The script's output is stored in the same path, having “_xored” as file name appendix. Keep in mind that encoding a file twice decodes it again, i.e. restores the original file.
Example usage: python testdata_encode.py maldoca/service/testdata/c98661bcd5bd2e5df06d3432890e7a2e8d6a3edcb5f89f6aaa2e5c79d4619f3d.docx
git clone --recurse-submodules https://github.com/google/maldoca.git
cd maldoca
Linux: bazel build --config=linux //maldoca/...
Windows: bazel build --config=windows //maldoca/...
Linux: bazel test --config=linux //maldoca/...
Windows: bazel test --config=windows //maldoca/...
We provide a docker file in “docker/Dockerfile”. This is the reference platform we use for continuous integration and optionally (arguably recommended) for development as well. Please check the documentation in “docker/Dockerfile” on how to build and use for development.
This is not an official Google product.