[go: nahoru, domu]

Skip to content

Latest commit

 

History

History
 
 

layoutlmv2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

LayoutLMv2 (Document Foundation Model)

Multimodal (text + layout/format + image) pre-training for Document AI

Introduction

LayoutLMv2 is an improved version of LayoutLM with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. It outperforms strong baselines and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks, including , including FUNSD (0.7895 → 0.8420), CORD (0.9493 → 0.9601), SROIE (0.9524 → 0.9781), Kleister-NDA (0.834 → 0.852), RVL-CDIP (0.9443 → 0.9564), and DocVQA (0.7295 → 0.8672).

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou, ACL 2021

Models

layoutlmv2-base-uncased | HuggingFace

Fine-tuning Example on FUNSD

Installation

Please refer to layoutlmft

Command

cd layoutlmft
python -m torch.distributed.launch --nproc_per_node=4 examples/run_funsd.py \
        --model_name_or_path microsoft/layoutlmv2-base-uncased \
        --output_dir /tmp/test-ner \
        --do_train \
        --do_predict \
        --max_steps 1000 \
        --warmup_ratio 0.1 \
        --fp16

Results

FUNSD (field-level)

Model Precision Recall F1
bert-base-uncased 0.5469 0.6710 0.6026
unilmv2-base-uncased 0.6349 0.6975 0.6648
layoutlm-base-uncased 0.7597 0.8155 0.7866
layoutlmv2-base-uncased 0.8029 0.8539 0.8276

Citation

If you find LayoutLMv2 useful in your research, please cite the following paper:

@inproceedings{Xu2020LayoutLMv2MP,
  title     = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding},
  author    = {Yang Xu and Yiheng Xu and Tengchao Lv and Lei Cui and Furu Wei and Guoxin Wang and Yijuan Lu and Dinei Florencio and Cha Zhang and Wanxiang Che and Min Zhang and Lidong Zhou},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL) 2021},
  year      = {2021}
}

License

The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Contact Information

For help or issues using LayoutLMv2, please submit a GitHub issue.

For other communications related to LayoutLMv2, please contact Lei Cui (lecu@microsoft.com), Furu Wei (fuwei@microsoft.com).