[go: nahoru, domu]

Skip to content

isotrforever/NER-corpus-for-construction

Repository files navigation

NER corpus for construction

Basic information ahout the corpus

The corpus contains 759 sentences, 2268 Named Entities, 15213 words and 24839 characters.

The Named Entities are labelled in the "BIO" format.

All the data are in the Construction-NER-corpus.csv, where words are listed in the first column, while tags in the sencond.

The raw corpora is collected from a series of supervison documents of a contruction project.

KAPPA test

The principles guiding the annotation are in Specification for annotation.docx or Specification(in English) for English version.

Kappa test were applied on the corpus, below is the result:

      B     I     O
  B   2026  119   123
  I   70    2763  174
  O   18    46    9874

About the author

If you have any question, please feel free to contact us: qqz@zju.edu.cn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages